DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance.
2.2M Pulls 1 Tag Updated 5 months ago
DeepSeek-V3.1-Terminus is a hybrid model that supports both thinking mode and non-thinking mode.
686.9K Pulls 8 Tags Updated 8 months ago
A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
3.8M Pulls 5 Tags Updated 1 year ago
DeepSeek-V4-Flash is a preview of the DeepSeek-V4 series, a Mixture-of-Experts model with 284B total parameters and 13B activated, built for efficient reasoning across a 1M-token context window.
87.7K Pulls 1 Tag Updated 1 month ago
38 Pulls 1 Tag Updated 4 weeks ago
3,257 Pulls 1 Tag Updated 5 months ago
18 Pulls 1 Tag Updated 4 weeks ago
Senior Go & SpecKit engineering agent powered by DeepSeek-v3.1 671B, optimized for idiomatic development and deterministic BDD testing.
67 Pulls 1 Tag Updated 1 month ago
This model is a distilled version of Qwen/Qwen3-30B-A3B-Instruct designed to inherit the reasoning and behavioral characteristics of its much larger teacher model, deepseek-ai/DeepSeek-V3.1.
1,954 Pulls 2 Tags Updated 8 months ago
DeepSeek-R1-0528 仍然使用 2024 年 12 月所发布的 DeepSeek V3 Base 模型作为基座,但在后训练过程中投入了更多算力,显著提升了模型的思维深度与推理能力。这个8B精馏版本编程能力都爆表!
1,043 Pulls 1 Tag Updated 12 months ago
This is not the ablation version. DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode.
217 Pulls 3 Tags Updated 8 months ago
136 Pulls 2 Tags Updated 8 months ago
7,181 Pulls 2 Tags Updated 1 year ago
3,840 Pulls 5 Tags Updated 1 year ago
(Unsloth Dynamic Quants) A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
2,058 Pulls 3 Tags Updated 1 year ago
Merged Unsloth's Dynamic Quantization
1,382 Pulls 1 Tag Updated 1 year ago
DeepSeek-V3-Pruned-Coder-411B is a pruned version of the DeepSeek-V3 reduced from 256 experts to 160 experts, The pruned model is mainly used for code generation.
1,377 Pulls 5 Tags Updated 1 year ago
This model has been developed based on DistilQwen2.5-DS3-0324-Series.
1,205 Pulls 7 Tags Updated 1 year ago
14 Pulls 1 Tag Updated 8 months ago
deepseek-v3-0324-Quants. - Q2_K is the lowest here - quantized = round((original - zero_point) / scale)
1,121 Pulls 1 Tag Updated 1 year ago