279 Downloads Updated 8 months ago
DPO fine-tuned of microsoft/Phi-3-medium-4k-instruct (14B params)
using the jpacifico/french-orca-dpo-pairs-revised rlhf dataset.
Training in French also improves the model in English, surpassing the performances of its base model.
Window context = 4k tokens
The Chocolatine model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
It does not have any moderation mechanism.