851 Downloads Updated 4 weeks ago
Updated 4 weeks ago
4 weeks ago
99291f3b3db7 · 2.4GB ·
I’ve uploaded additional quantizations of the qwen3 models, with two distinct variations:
Thinking versions: Retain the original Qwen3’s hybrid step-by-step reasoning.
Non-Thinking versions: Provide faster responses without the step-by-step. This is the 4b-2507 version.
The 1.7b model includes some standard quants as well as some Unsloth’s DynamicQuant2.0 versions, which offer superior accuracy while maintaining efficiency. All of the 4b and 8b models are Unlsoth DQ2.
To switch between the Thinking versions’ step-by-step mode and instant-response mode, add /think
or /no_think
to the system/user prompts.
For 1.7b (default is Q4_K_XL):
(The Q5_K_M and Q6_K models were quantized from the fp16 using ollama. To take advantage of Unsloth’s DynamicQuant2.0, use the K_XL quants.)
For 4b (default is Q3_K_XL):
For 4b-2507 (Non-Thinking version):
For 8b (default is non-thinking 2507 version at Q4_0):
Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. The models feature a unique hybrid approach with two modes:
Thinking Mode: Takes time to reason step by step before delivering the final answer, ideal for complex problems requiring deeper thought.
Non-Thinking Mode: Provides quick, near-instant responses, suitable for simpler questions where speed is more important than depth.
Qwen3 models support 119 languages and dialects, making them truly multilingual. They excel at coding, math, reasoning, and agentic capabilities, with significantly improved performance over previous generations.
Hybrid Thinking Modes: Switch between detailed reasoning and quick responses
Multilingual Support: 119 languages and dialects
Improved Agentic Capabilities: Enhanced tool use and environmental interaction
Context Length: 32K-128K tokens depending on model size
Open Weights: Available under Apache 2.0 license
You can dynamically switch between thinking and non-thinking modes by adding /think
or /no_think
to one of these three:
The beginning or end of the system prompt
The beginning or end of the user prompt
(I’m not sure if it’ll work if you put it in the middle of the prompts.)