reaperdoesntrun/Qwen3-0.6B-Distilled/params

reaperdoesntrun/ Qwen3-0.6B-Distilled:latest

255 Downloads Updated 2 months ago

A 0.6B parameter model built in two stages: knowledge distillation from a 30B Thinking teacher to establish a structured reasoning backbone, then supervised fine-tuning on legal instruction data. 50x compression. Under 500MB quantized. Runs on a phone.

tools thinking

Qwen3-0.6B-Distilled:latest ... /

params

cff3f395ef37 · 120B

{

"repeat_penalty": 1,

"stop": [

"<|im_start|>",

"<|im_end|>"

],

"temperature": 0.6,

"top_k": 20,

"top_p": 0.95

}