24 4 days ago

A 0.6B parameter model built in two stages: knowledge distillation from a 30B Thinking teacher to establish a structured reasoning backbone, then supervised fine-tuning on legal instruction data. 50x compression. Under 500MB quantized. Runs on a phone.

tools thinking