mannix/ gemma2-9b-sppo-iter3:q3_k_m

1,104 Downloads Updated 1 year ago

This model was developed using Self-Play Preference Optimization at iteration 3, based on the google/gemma-2-9b-it architecture as starting point.

ollama run mannix/gemma2-9b-sppo-iter3:q3_k_m

curl http://localhost:11434/api/chat \
  -d '{
    "model": "mannix/gemma2-9b-sppo-iter3:q3_k_m",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='mannix/gemma2-9b-sppo-iter3:q3_k_m',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'mannix/gemma2-9b-sppo-iter3:q3_k_m',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 year ago

1 year ago

f369784c75e1 · 4.8GB ·

model

archgemma2

·

parameters9.24B

·

quantizationQ3_K_M

4.8GB

license

Gemma Terms of Use Last modified: February 21, 2024 By using, reproducing, modifying, distributing,

8.4kB

params

{ "num_ctx": 4096, "num_predict": 4096, "repeat_penalty": 1, "stop": [ "<sta

118B

template

<start_of_turn>user {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }}<end_of_turn> <start_of_turn

137B

Readme

Quantizations with i-matrix calibration_datav3.txt
Safetensors converted to fp32

Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)

Gemma-2-9B-It-SPPO-Iter3

This model was developed using Self-Play Preference Optimization at iteration 3, based on the google/gemma-2-9b-it architecture as starting point. We utilized the prompt sets from the openbmb/UltraFeedback dataset, splited to 3 parts for 3 iterations by snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset. All responses used are synthetic.

Links to Other Models

Model Description

Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.
Language(s) (NLP): Primarily English
License: Apache-2.0
Finetuned from model: google/gemma-2-9b-it

AlpacaEval Leaderboard Evaluation Results

Model	LC. Win Rate	Win Rate	Avg. Length
Gemma-2-9B-SPPO Iter1	48.70	40.76	1669
Gemma-2-9B-SPPO Iter2	50.93	44.64	1759
Gemma-2-9B-SPPO Iter3	53.27	47.74	1803

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
eta: 1000
per_device_train_batch_size: 8
gradient_accumulation_steps: 1
seed: 42
distributed_type: deepspeed_zero3
num_devices: 8
optimizer: RMSProp
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_train_epochs: 1.0

Citation

@misc{wu2024self,
      title={Self-Play Preference Optimization for Language Model Alignment}, 
      author={Wu, Yue and Sun, Zhiqing and Yuan, Huizhuo and Ji, Kaixuan and Yang, Yiming and Gu, Quanquan},
      year={2024},
      eprint={2405.00675},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

- Quantizations with i-matrix `calibration_datav3.txt`
- Safetensors converted to fp32

Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)

# Gemma-2-9B-It-SPPO-Iter3

This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.

## Links to Other Models
- [Gemma-2-9B-It-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1)
- [Gemma-2-9B-It-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2)
- [Gemma-2-9B-It-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3)

### Model Description

- Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.
- Language(s) (NLP): Primarily English
- License: Apache-2.0
- Finetuned from model: google/gemma-2-9b-it

## [AlpacaEval Leaderboard Evaluation Results](https://tatsu-lab.github.io/alpaca_eval/)

|                Model                           | LC. Win Rate | Win Rate | Avg. Length |
|-------------------------------------------|:------------:|:--------:|:-----------:|
|[Gemma-2-9B-SPPO Iter1](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1) |48.70 |40.76 | 1669
|[Gemma-2-9B-SPPO Iter2](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2) |50.93 | 44.64 | 1759
|[Gemma-2-9B-SPPO Iter3](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3) |**53.27** |**47.74** | 1803

### Training hyperparameters
The following hyperparameters were used during training:

- learning_rate: 5e-07
- eta: 1000
- per_device_train_batch_size: 8
- gradient_accumulation_steps: 1
- seed: 42
- distributed_type: deepspeed_zero3
- num_devices: 8
- optimizer: RMSProp 
- lr_scheduler_type: linear 
- lr_scheduler_warmup_ratio: 0.1
- num_train_epochs: 1.0

## Citation
```
@misc{wu2024self,
      title={Self-Play Preference Optimization for Language Model Alignment}, 
      author={Wu, Yue and Sun, Zhiqing and Yuan, Huizhuo and Ji, Kaixuan and Yang, Yiming and Gu, Quanquan},
      year={2024},
      eprint={2405.00675},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
```

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)