1,682 Downloads Updated 1 month ago
Updated 3 months ago
3 months ago
99291f3b3db7 · 2.4GB ·
I’ve uploaded additional quantizations of the qwen3 models, with two distinct variations:
Thinking versions: All of the models are hybrid thinking and non-thinking models except the 4b-2507 models. There is a thinking version of that model, and a non-thinking version.
Non-Thinking versions: Provide faster responses without the step-by-step. This is the 4b-2507 version.
The 1.7b model includes some standard quants as well as some Unsloth’s DynamicQuant2.0 versions, which offer superior accuracy while maintaining efficiency. All of the 4b and 8b models are Unlsoth DQ2.
To switch between the Thinking versions’ step-by-step mode and instant-response mode, add /think or /no_think to the system/user prompts.
For 1.7b (default is Q4_K_XL):
(The Q5_K_M and Q6_K models were quantized from the fp16 using ollama. To take advantage of Unsloth’s DynamicQuant2.0, use the K_XL quants.)
For 4b (default is the 2507 non-thinking Q4_0 version.):
For 4b-2507 (Non-Thinking version):
I’ve also uploaded the thinking version of 4b-2507 in Q4_0.
For 8b (default is non-thinking 2507 version at Q4_0):
Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. The models feature a unique hybrid approach with two modes:
Thinking Mode: Takes time to reason step by step before delivering the final answer, ideal for complex problems requiring deeper thought.
Non-Thinking Mode: Provides quick, near-instant responses, suitable for simpler questions where speed is more important than depth.
Qwen3 models support 119 languages and dialects, making them truly multilingual. They excel at coding, math, reasoning, and agentic capabilities, with significantly improved performance over previous generations.
Hybrid Thinking Modes: Switch between detailed reasoning and quick responses
Multilingual Support: 119 languages and dialects
Improved Agentic Capabilities: Enhanced tool use and environmental interaction
Context Length: 32K-128K tokens depending on model size
Open Weights: Available under Apache 2.0 license
You can dynamically switch between thinking and non-thinking modes by adding /think or /no_think to one of these three:
The beginning or end of the system prompt
The beginning or end of the user prompt
(I’m not sure if it’ll work if you put it in the middle of the prompts.)