235 1 year ago

Meta Llama-3-8b with Self-Play Preference Optimization for Language Model Alignment at iteration 3

21 models