starling-lm:7b

104.5K Downloads Updated 1 year ago

Starling is a large language model trained by reinforcement learning from AI feedback focused on improving chatbot helpfulness.

Updated 1 year ago

1 year ago

39153f619be6 · 4.1GB ·

model

archllama

parameters7.24B

quantizationQ4_0

4.1GB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

template

{{ if .System }}GPT4 Correct System: {{ .System }}<|end_of_turn|>{{ end }}{{ if .Prompt }}GPT4 Corre

200B

params

{ "stop": [ "<|endoftext|>", "<|end_of_turn|>", "Human:", "Assis

87B

Readme

Starling-7B is an open (non-commercial) large language model (LLM) trained by reinforcement learning from AI feedback. (RLAIF)

The model harnesses the power of our new GPT-4 labeled ranking dataset, Nectar, and our new reward training and policy tuning pipeline. Starling-7B-alpha scores 8.09 in MT Bench with GPT-4 as a judge, outperforming every model to date on MT-Bench except for OpenAI’s GPT-4 and GPT-4 Turbo.

*Based on MT Bench evaluations, using GPT-4 scoring. Further human evaluation is needed.

Authors: Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu and Jiantao Jiao.

For correspondence, please contact Banghua Zhu (banghua@berkeley.edu).

Reference

Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF

HuggingFace