1,028 1 year ago

This model was developed using Self-Play Preference Optimization at iteration 3, based on the google/gemma-2-9b-it architecture as starting point.

6522ca797f47 · 99B
{
"num_ctx": 4096,
"repeat_penalty": 1,
"stop": [
"<start_of_turn>",
"<end_of_turn>"
]
}