DeepSeek's first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.
18.7M Pulls 29 Tags Updated 13 days ago
An open-source Mixture-of-Experts code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks.
665.6K Pulls 64 Tags Updated 5 months ago
A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
613.1K Pulls 5 Tags Updated 5 weeks ago
DeepSeek Coder is a capable coding model trained on two trillion code and natural language tokens.
562.4K Pulls 102 Tags Updated 14 months ago
An advanced language model crafted with 2 trillion bilingual tokens.
127K Pulls 64 Tags Updated 14 months ago
A strong, economical, and efficient Mixture-of-Experts language model.
119.1K Pulls 34 Tags Updated 8 months ago
An upgraded version of DeekSeek-V2 that integrates the general and coding abilities of both DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.
22.5K Pulls 7 Tags Updated 5 months ago
A fully open-source family of reasoning models built using a dataset derived by distilling DeepSeek-R1.
494.5K Pulls 9 Tags Updated 7 days ago
A fine-tuned version of Deepseek-R1-Distilled-Qwen-1.5B that surpasses the performance of OpenAI’s o1-preview with just 1.5B parameters on popular math evaluations.
53.7K Pulls 5 Tags Updated 8 days ago
5,350 Pulls 1 Tag Updated 3 weeks ago
DeepSeek's first generation reasoning models with comparable performance to OpenAI-o1.
357.1K Pulls 46 Tags Updated 2 weeks ago
This is a modified model that adds support for autonomous coding agents like Cline
191.3K Pulls 6 Tags Updated 3 weeks ago
Unsloth's DeepSeek-R1 , I just merged the thing and uploaded it here. This is the full 671b model. MoE Bits:1.58bit Type:UD-IQ1_S Disk Size:131GB Accuracy:Fair Details:MoE all 1.56bit. down_proj in MoE mixture of 2.06/1.56bit
147.2K Pulls 2 Tags Updated 2 weeks ago
Unsloth's DeepSeek-R1 1.58-bit, I just merged the thing and uploaded it here. This is the full 671b model, albeit dynamically quantized to 1.58bits.
87.6K Pulls 1 Tag Updated 3 weeks ago
Merged GGUF Unsloth's DeepSeek-R1 671B 2.51bit dynamic quant
52.5K Pulls 1 Tag Updated 2 weeks ago
44.9K Pulls 1 Tag Updated 2 weeks ago