deepseek-v3.1

DeepSeek-V3.1-Terminus is a hybrid model that supports both thinking mode and non-thinking mode.

tools thinking cloud 671b

700.3K Pulls 8 Tags Updated 6 months ago

deepseek-v3

A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

671b

3.8M Pulls 5 Tags Updated 1 year ago

fhagenciadigital/ds-go-pro

Senior Go & SpecKit engineering agent powered by DeepSeek-v3.1 671B, optimized for idiomatic development and deterministic BDD testing.

cloud

85 Pulls 1 Tag Updated 2 months ago

ukjin/Qwen3-30B-A3B-Thinking-2507-Deepseek-v3.1-Distill

This model is a distilled version of Qwen/Qwen3-30B-A3B-Instruct designed to inherit the reasoning and behavioral characteristics of its much larger teacher model, deepseek-ai/DeepSeek-V3.1.

tools thinking 4b

2,026 Pulls 2 Tags Updated 9 months ago

zerocopia/deepseek-v3.1

tools thinking cloud

19 Pulls 1 Tag Updated 1 month ago

huihui_ai/deepseek-v3.1

This is not the ablation version. DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode.

tools thinking 671b

234 Pulls 3 Tags Updated 9 months ago

pdevine/deepseek-v3.1

tools thinking cloud

139 Pulls 2 Tags Updated 9 months ago

clore/deepseek-v3.1

14 Pulls 1 Tag Updated 9 months ago

huihui_ai/deepseek-v3-pruned

DeepSeek-V3-Pruned-Coder-411B is a pruned version of the DeepSeek-V3 reduced from 256 experts to 160 experts, The pruned model is mainly used for code generation.

411b

1,390 Pulls 5 Tags Updated 1 year ago

huihui_ai/deepseek-v3

A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

7,186 Pulls 2 Tags Updated 1 year ago

huihui_ai/deepseek-v3-abliterated

A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

671b

3,929 Pulls 5 Tags Updated 1 year ago

milkey/deepseek-v3-UD

(Unsloth Dynamic Quants) A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

2,063 Pulls 3 Tags Updated 1 year ago

lordoliver/DeepSeek-V3-0324

DeepSeep V3 from March 2025 Merged from Unsloth's HF - 671B params - Q8_0/713 GB & Q4_K_M/404 GB

671b

959 Pulls 4 Tags Updated 1 year ago

org/deepseek-v3-fast

Single file version with (Dynamic Quants) A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

114 Pulls 4 Tags Updated 1 year ago

lucataco/deepseek-v3-64k

A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

23 Pulls 1 Tag Updated 1 year ago