37 Downloads Updated 1 year ago
Updated 1 year ago
1 year ago
8ccdb6ef4f2d · 4.4GB ·
Model source: https://huggingface.co/splusminusx/Starling-LM-7B-beta-GGUF
Quantized version of Nexusflow/Starling-LM-7B-beta.
We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO). Harnessing the power of the ranking dataset, berkeley-nest/Nectar, the upgraded reward model, Starling-RM-34B, and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.