sam860/deepseek-r1-qwen-distill

Notes

I’ve uploaded additional quantizations of the deepseek-r1-qwen-distill:1.5b, all pulled from Unsloth’s DynamicQuant2.0 page on hf, which offer superior accuracy while maintaining efficiency. The default is the Q4_K_M, but there’s also Q3_K_XL, IQ4_XS, Q4_K_XL, and Q5_K_M

Overview

DeepSeek-R1-Distill-Qwen-1.5B is a powerful 1.5B parameter language model distilled from DeepSeek-R1, specifically designed to deliver exceptional reasoning capabilities in a compact form factor. This model is based on the Qwen architecture and has been optimized for deployment in resource-constrained environments while maintaining impressive performance.

Key Features

Advanced Reasoning Capabilities: Inherits the reasoning patterns from the larger DeepSeek-R1 model through knowledge distillation
Efficient Architecture: Based on Qwen 1.5B architecture with 1.78B parameters
Optimized Performance: Delivers strong reasoning abilities despite its compact size
Chain-of-Thought Processing: Capable of generating detailed reasoning steps for complex problems
Self-Verification: Demonstrates ability to verify its own reasoning process
Reflection Capabilities: Can reflect on and improve its own outputs

Model Background

DeepSeek-R1-Distill-Qwen-1.5B is part of the DeepSeek-R1 family of models, which were developed through large-scale reinforcement learning. The DeepSeek-R1 series represents a significant advancement in reasoning capabilities for language models:

Innovative Training Approach: The original DeepSeek-R1 models were trained via large-scale reinforcement learning, with DeepSeek-R1-Zero being trained without supervised fine-tuning as a preliminary step.
Distillation Process: The reasoning patterns from the larger DeepSeek-R1 model (which has 671B total parameters) were distilled into this more compact 1.5B parameter model, demonstrating that smaller models can inherit powerful reasoning capabilities from larger ones.
Performance: The distilled models perform exceptionally well on benchmarks, with the larger variants in the family achieving performance comparable to models like OpenAI-o1 across math, code, and reasoning tasks.

This model is ideal for applications requiring strong reasoning capabilities in resource-constrained environments, offering an excellent balance between model size and performance.

HuggingFace

Useful DQ2 quants for deepseek-r1-qwen-distill:1.5b

Models

Readme

Notes

Overview

Key Features

Model Background