mannix/eurus-2-7b-prime:Q3_K

mannix/ eurus-2-7b-prime:Q3_K_S

152 Downloads Updated 1 year ago

Eurus-2-7B-PRIME is trained using PRIME (Process Reinforcement through IMplicit rEward) method, an open-source solution for online reinforcement learning (RL) with process rewards, to advance reasoning abilities of language models.

ollama run mannix/eurus-2-7b-prime:Q3_K_S

curl http://localhost:11434/api/chat \
  -d '{
    "model": "mannix/eurus-2-7b-prime:Q3_K_S",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='mannix/eurus-2-7b-prime:Q3_K_S',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'mannix/eurus-2-7b-prime:Q3_K_S',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 year ago

1 year ago

bd5d24b6bbad · 3.5GB ·

model

archqwen2

parameters7.62B

quantizationQ3_K_S

3.5GB

system

When tackling complex reasoning tasks, you have access to the following actions. Use them as needed

384B

license

Qwen RESEARCH LICENSE AGREEMENT Qwen RESEARCH LICENSE AGREEMENT Release Date: September 19, 2024 By

7.4kB

template

{{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} <|im_start|>{{ .R

255B

Readme

Quantization from fp32
Using i-matrix calibration_datav3.txt

Eurus-2-7B-PRIME is trained using PRIME (Process Reinforcement through IMplicit rEward) method, an open-source solution for online reinforcement learning (RL) with process rewards, to advance reasoning abilities of language models beyond imitation or distillation. It starts with Eurus-2-7B-SFT and trains on Eurus-2-RL-Data.

System Prompt

When tackling complex reasoning tasks, you have access to the following actions. Use them as needed to progress through your thought process.

[ASSESS]

[ADVANCE]

[VERIFY]

[SIMPLIFY]

[SYNTHESIZE]

[PIVOT]

[OUTPUT]

You should strictly follow the format below:

[ACTION NAME]

# Your action step 1

# Your action step 2

# Your action step 3

...

Next action: [NEXT ACTION NAME]

References

Hugging face