3logic/

llama_dpo

3 Downloads Updated 1 year ago

a finetuned Llama 3.1 8b model with supplementary DPO training

tools

Models

Name

1 model

Size

Context

Input

llama_dpo:latest

16GB · 128K context window · Text · 1 year ago

llama_dpo:latest

16GB

128K

Text

Readme

Model Specifications

Base Model

Architecture: Llama 3.1
Size: 8B parameters
Type: Instruct model
Precision: FP16

Finetuned model parameters

Method: Full Precision LoRA
Epoch: 20
Rank: 64
Alpha: 16
Training Dataset: train_05_prompted_v2

DPO Parameters

Epoch: 3
Rank: 64
Alpha: 16
Training Dataset: dpo_en_demo

Performance Metrics

Metric	DPO Score	Fine Tuned Model Score	Base Model Score
Agentic Similarity	83.67	86	84.67
CoT Contextual Accuracy	⁵⁵⁄₄	56 / 3	54 / 5
Medical GPT Score	58.17	65	51.75

Loss Function: