wangrongsheng/sfr-iterative-dpo-llama-3-8b-r/params

wangrongsheng/ sfr-iterative-dpo-llama-3-8b-r:latest

278 Downloads Updated 2 years ago

SFR-Iterative-DPO-LLaMA-3-8B-R is a further (SFT and RLHF) fine-tuned model on LLaMA-3-8B, which provides good performance. The model is from Salesforce team.

sfr-iterative-dpo-llama-3-8b-r:latest ... /

params

577073ffcc6c · 110B

{

"num_keep": 24,

"stop": [

"<|start_header_id|>",

"<|end_header_id|>",

"<|eot_id|>"

]

}