splusminusx/ starling-lm-7b-beta:Q4_K_M

37 Downloads Updated 1 year ago

https://huggingface.co/splusminusx/Starling-LM-7B-beta-GGUF

ollama run splusminusx/starling-lm-7b-beta:Q4_K_M

curl http://localhost:11434/api/chat \
  -d '{
    "model": "splusminusx/starling-lm-7b-beta:Q4_K_M",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='splusminusx/starling-lm-7b-beta:Q4_K_M',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'splusminusx/starling-lm-7b-beta:Q4_K_M',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 year ago

1 year ago

8ccdb6ef4f2d · 4.4GB ·

model

archllama

·

parameters7.24B

·

quantizationQ4_K_M

4.4GB

params

{ "num_ctx": 4096, "stop": [ "<|endoftext|>", "<|end_of_turn|>", "Hu

102B

template

{{ .System }}<|end_of_turn|>GPT4 Correct User: {{ .Prompt}}<|end_of_turn|>GPT4 Correct Assistant:

97B

Readme

Model source: https://huggingface.co/splusminusx/Starling-LM-7B-beta-GGUF

Starling-LM-7B-beta-GGUF

Quantized version of Nexusflow/Starling-LM-7B-beta.

Developed by: The Nexusflow Team ( Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao).
Model type: Language Model finetuned with RLHF / RLAIF
License: Apache-2.0 license under the condition that the model is not used to compete with OpenAI
Finetuned from model: Openchat-3.5-0106 (based on Mistral-7B-v0.1)

We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO). Harnessing the power of the ranking dataset, berkeley-nest/Nectar, the upgraded reward model, Starling-RM-34B, and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.

Model source: https://huggingface.co/splusminusx/Starling-LM-7B-beta-GGUF

# Starling-LM-7B-beta-GGUF

Quantized version of [Nexusflow/Starling-LM-7B-beta](https://huggingface.co/Nexusflow/Starling-LM-7B-beta).

- **Developed by: The Nexusflow Team (** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao).
- **Model type:** Language Model finetuned with RLHF / RLAIF
- **License:** Apache-2.0 license under the condition that the model is not used to compete with OpenAI
- **Finetuned from model:** [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) (based on [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1))

We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) with our new reward model [Nexusflow/Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B) and policy optimization method [Fine-Tuning Language Models from Human Preferences (PPO)](https://arxiv.org/abs/1909.08593).
Harnessing the power of the ranking dataset, [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar), the upgraded reward model, [Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B), and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)