3 1 year ago

a finetuned Llama 3.1 8b model with supplementary DPO training

tools

Models

View all →

Readme

Model Specifications

Base Model

  • Architecture: Llama 3.1
  • Size: 8B parameters
  • Type: Instruct model
  • Precision: FP16

Finetuned model parameters

  • Method: Full Precision LoRA
  • Epoch: 20
  • Rank: 64
  • Alpha: 16
  • Training Dataset: train_05_prompted_v2

DPO Parameters

  • Epoch: 3
  • Rank: 64
  • Alpha: 16
  • Training Dataset: dpo_en_demo

Performance Metrics

Metric DPO Score Fine Tuned Model Score Base Model Score
Agentic Similarity 83.67 86 84.67
CoT Contextual Accuracy 554 56 / 3 54 / 5
Medical GPT Score 58.17 65 51.75

Loss Function:

training_loss.png