3logic/
llama_dpo:latest

3 1 year ago

a finetuned Llama 3.1 8b model with supplementary DPO training

tools

1 year ago

158aab3789a2 · 16GB ·

llama
·
8.03B
·
F16
{{- if or .System .Tools }}<|start_header_id|>system<|end_header_id|> {{- if .System }} {{ .System }
LLAMA 3.1 COMMUNITY LICENSE AGREEMENT Llama 3.1 Version Release Date: July 23, 2024 “Agreement”
llama
·
168M
·
F16
{ "stop": [ "<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>"

Readme

Model Specifications

Base Model

  • Architecture: Llama 3.1
  • Size: 8B parameters
  • Type: Instruct model
  • Precision: FP16

Finetuned model parameters

  • Method: Full Precision LoRA
  • Epoch: 20
  • Rank: 64
  • Alpha: 16
  • Training Dataset: train_05_prompted_v2

DPO Parameters

  • Epoch: 3
  • Rank: 64
  • Alpha: 16
  • Training Dataset: dpo_en_demo

Performance Metrics

Metric DPO Score Fine Tuned Model Score Base Model Score
Agentic Similarity 83.67 86 84.67
CoT Contextual Accuracy 554 56 / 3 54 / 5
Medical GPT Score 58.17 65 51.75

Loss Function:

training_loss.png