86 1 year ago

Reasoning model distilled from DeepSeek-R1, enhanced with GRPO using supplementary reasoning datasets.

14b