mannix/smallthinker

mannix/ smallthinker:latest

119 Downloads Updated 1 year ago

A new small reasoning model fine-tuned from the Qwen 2.5 3B Instruct model. I-Quants models.

ollama run mannix/smallthinker

curl http://localhost:11434/api/chat \
  -d '{
    "model": "mannix/smallthinker",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='mannix/smallthinker',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'mannix/smallthinker',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 year ago

1 year ago

4eed365cc9d6 · 2.0GB ·

model

archqwen2

parameters3.4B

quantizationQ4_0

2.0GB

license

Qwen RESEARCH LICENSE AGREEMENT Qwen RESEARCH LICENSE AGREEMENT Release Date: September 19, 2024 By

7.4kB

system

You are a helpful assistant.

28B

template

{{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} <|im_start|>{{ .R

255B

Readme

A new model fine-tuned from the Qwen2.5-3b-Instruct model.

Quantization from fp32
Using i-matrix calibration_datav3.txt

SmallThinker is designed for the following use cases:

Edge Deployment: Its small size makes it ideal for deployment on resource-constrained devices.
Draft Model for QwQ-32B-Preview: SmallThinker can serve as a fast and efficient draft model for the larger QwQ-32B-Preview model, yielding a 70% speedup.

For achieving reasoning capabilities, it’s crucial to generate long chains of COT reasoning. Therefore, based on QWQ-32B-Preview, the authors used various synthetic techniques(such as personahub) to create the QWQ-LONGCOT-500K dataset. Compared to other similar datasets, over 75% of the author’s samples have output tokens exceeding 8K. To encourage research in the open-source community, the dataset was also made publicly available.

References

Hugging Face