SIGJNF/deepseek-r1-671b-1.58bit

Unsloth’s DeepSeek-R1 1.58-bit, I just merged the thing and uploaded it here. This is the full 671b model, albeit dynamically quantized to 1.58bits. You can run it on a single 4090.

See how it performs here.

Read more about it here.

I take no credit whatsoever, I don’t even know if the thing works since my M4 mini isn’t powerful enough to endure this model. I just merged the GGUF files and uploaded them here.

LICENSE

This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:

DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1.
DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license.
DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license.

Unsloth's DeepSeek-R1 1.58-bit, I just merged the thing and uploaded it here. This is the full 671b model, albeit dynamically quantized to 1.58bits. You can run it on a single 4090.

See how it performs [here.](https://www.reddit.com/r/LocalLLaMA/comments/1ibbloy/158bit_deepseek_r1_131gb_dynamic_gguf/)

Read more about it [here.](https://huggingface.co/unsloth/DeepSeek-R1-GGUF)

I take no credit whatsoever, I don't even know if the thing works since my M4 mini isn't powerful enough to endure this model. I just merged the GGUF files and uploaded them here.

LICENSE

DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1.
    DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license.
    DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)