DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models.
123.7K Pulls 2 Tags Updated 11 months ago
29.1K Pulls 4 Tags Updated 7 months ago
DeepSeek-R1-Distill-Qwen-1.5B
3,522 Pulls 1 Tag Updated 10 months ago
Additional training on Japanese data by CyberAgent for deepseek-r1.
3,000 Pulls 2 Tags Updated 10 months ago
2,543 Pulls 1 Tag Updated 11 months ago
DeepSeek-R1-Distill-Qwen-7B
2,363 Pulls 1 Tag Updated 10 months ago
2,359 Pulls 1 Tag Updated 10 months ago
DeepSeek-R1-Distill-Qwen-14B
1,649 Pulls 1 Tag Updated 10 months ago
Model from https://huggingface.co/neody/DeepSeek-R1-Distill-Qwen-7B-gguf/tree/main
1,421 Pulls 1 Tag Updated 11 months ago
1,169 Pulls 1 Tag Updated 11 months ago
DeepSeek-R1-Distill-Qwen-Coder-32B-Fusion-9010 is a mixed model that combines the strengths of two powerful DeepSeek-R1-Distill-Qwen-based models: huihui-ai/DeepSeek-R1-Distill-Qwen-32B-abliterated and huihui-ai/Qwen2.5-Coder-32B-Instruct-abliterated.
1,075 Pulls 6 Tags Updated 9 months ago
771 Pulls 3 Tags Updated 11 months ago
DeepSeek R1 Distilled model to one-fourth its original file size—without losing any accuracy.
660 Pulls 1 Tag Updated 10 months ago
Optimized for 38 languages
571 Pulls 1 Tag Updated 10 months ago
513 Pulls 1 Tag Updated 10 months ago
488 Pulls 5 Tags Updated 7 months ago
368 Pulls 1 Tag Updated 11 months ago
Deepseek-r1 distilled 14b Qwen model, Q3_K_M quantized version
319 Pulls 1 Tag Updated 9 months ago
285 Pulls 1 Tag Updated 10 months ago
239 Pulls 1 Tag Updated 11 months ago