https://huggingface.co/Nexusflow/Starling-LM-7B-beta
300 Pulls Updated 8 months ago
Updated 8 months ago
8 months ago
0ccbe013b65a · 5.9GB
Readme
Starling-LM-7B-beta
- Developed by: The Nexusflow Team ( Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao).
- Model type: Language Model finetuned with RLHF / RLAIF
- License: Apache-2.0 license under the condition that the model is not used to compete with OpenAI
- Finetuned from model: Openchat-3.5-0106 (based on Mistral-7B-v0.1)
We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO). Harnessing the power of the ranking dataset, berkeley-nest/Nectar, the upgraded reward model, Starling-RM-34B, and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.