74 7 months ago

Reasoning model distilled from DeepSeek-R1, enhanced with GRPO using supplementary reasoning datasets.

14b