54 Downloads Updated 3 months ago
I’ve uploaded additional quantizations of the deepseek-r1-qwen-distill:1.5b, all pulled from Unsloth’s DynamicQuant2.0 page on hf, which offer superior accuracy while maintaining efficiency. The default is the Q4_K_M, but there’s also Q3_K_XL, IQ4_XS, Q4_K_XL, and Q5_K_M
DeepSeek-R1-Distill-Qwen-1.5B is a powerful 1.5B parameter language model distilled from DeepSeek-R1, specifically designed to deliver exceptional reasoning capabilities in a compact form factor. This model is based on the Qwen architecture and has been optimized for deployment in resource-constrained environments while maintaining impressive performance.
DeepSeek-R1-Distill-Qwen-1.5B is part of the DeepSeek-R1 family of models, which were developed through large-scale reinforcement learning. The DeepSeek-R1 series represents a significant advancement in reasoning capabilities for language models:
Innovative Training Approach: The original DeepSeek-R1 models were trained via large-scale reinforcement learning, with DeepSeek-R1-Zero being trained without supervised fine-tuning as a preliminary step.
Distillation Process: The reasoning patterns from the larger DeepSeek-R1 model (which has 671B total parameters) were distilled into this more compact 1.5B parameter model, demonstrating that smaller models can inherit powerful reasoning capabilities from larger ones.
Performance: The distilled models perform exceptionally well on benchmarks, with the larger variants in the family achieving performance comparable to models like OpenAI-o1 across math, code, and reasoning tasks.
This model is ideal for applications requiring strong reasoning capabilities in resource-constrained environments, offering an excellent balance between model size and performance.