SFR-Iterative-DPO-LLaMA-3-8B-R is a further (SFT and RLHF) fine-tuned model on LLaMA-3-8B, which provides good performance. The model is from Salesforce team.
172 Pulls Updated 11 months ago
Updated 11 months ago
11 months ago
df04f1eec67a · 4.7GB
model
archllama
·
parameters8.03B
·
quantizationQ4_0
4.7GB
system
You're a very useful AI assistant.
34B
params
{
"num_keep": 24,
"stop": [
"<|start_header_id|>",
"<|end_header_id|>",
110B
template
{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .P
255B
Readme
SFR-Iterative-DPO-LLaMA-3-8B-R is a further (SFT and RLHF) fine-tuned model on LLaMA-3-8B, which provides good performance. The model is from Salesforce team.