1,100 3 months ago

Useful DQ2 quants for deepseek-r1-0528-qwen3-distill:8b

8b

Models

View all →

Readme

Notes

I’ve uploaded additional quantizations of the deepseek-r1-0528-qwen3:8b, all pulled from Unsloth’s DynamicQuant2.0 page on hf, which offer superior accuracy while maintaining efficiency. The default is the Q4_K_M, but there’s also Q2_K_XL.

Overview

DeepSeek-R1-0528-Qwen3-8B represents a significant upgrade to the DeepSeek R1 model series, built on the Qwen3 architecture. This version (0528) delivers enhanced reasoning and inference capabilities through algorithmic optimization and increased computational resources during post-training. The model demonstrates exceptional performance across various benchmark evaluations, with results approaching those of leading models like O3 and Gemini 2.5 Pro.

Key Improvements

The DeepSeek-R1-0528 upgrade brings several notable enhancements over previous versions:

  • Deeper Reasoning: Significantly improved depth of reasoning with an increase from 12K to 23K tokens per question on the AIME test set
  • Higher Accuracy: Substantial performance gains across mathematical, programming, and logical reasoning tasks
  • Reduced Hallucination: Lower rates of fabricated information in responses
  • Enhanced Function Calling: Improved support for structured function calling operations
  • Better Vibe Coding: Superior experience for code generation and manipulation tasks

Usage Information

For benchmarks requiring sampling, the model uses: - Temperature: 0.6 - Top-p value: 0.95 - 16 responses per query to estimate pass@1

This model is ideal for applications requiring advanced reasoning capabilities, complex problem-solving, and code generation. Its improved performance in mathematical reasoning makes it particularly suitable for educational and research applications.

HuggingFace