21 3 weeks ago

Revolutionary model with unique thinking/non-thinking modes, delivering superior reasoning performance with seamless mode switching for any task.

Models

View all →

Readme

Qwen3-32B: Advanced Reasoning with Thinking Modes

πŸš€ Overview

Qwen3-32B is a state-of-the-art 32.8 billion parameter language model that represents the latest generation of Qwen’s model suite. Designed for exceptional reasoning capabilities, this model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within a single model, ensuring optimal performance across diverse scenarios.

🎯 Key Features

  • 32.8B parameters with optimized architecture (64 layers, GQA: 64 Q, 8 KV heads)
  • Seamless thinking mode switching - think or don’t think within single model
  • 131K context window with YaRN scaling (32K native + extended)
  • 100+ languages supported with strong multilingual capabilities
  • Agent capabilities with precise tool integration
  • Superior reasoning performance across mathematics, code, and logic
  • OpenAI-compatible API support for easy integration

🧠 Thinking vs Non-Thinking Modes

Thinking Mode (enable_thinking=True)

  • Complex Logical Reasoning: Multi-step problem solving
  • Mathematical Problem Solving: Step-by-step calculations
  • Code Generation: Detailed programming with explanations
  • Advanced Analysis: Deep dive into complex topics
  • Research Tasks: Thorough investigation and documentation

Non-Thinking Mode (enable_thinking=False)

  • Efficient Dialogue: Quick, direct responses
  • Real-time Applications: Faster inference for chat applications
  • Streaming Use Cases: Live conversation and interactive systems
  • Resource-Constrained Environments: Reduced computational overhead
  • Creative Writing: Direct content generation

πŸ† Performance Highlights

  • Mathematics: Exceptional performance on complex mathematical problems
  • Code Generation: State-of-the-art across multiple programming languages
  • Reasoning: Superior logical deduction and analysis capabilities
  • Multilingual: Leading performance across 100+ languages
  • Agent Tasks: Top performance in complex tool-using scenarios

πŸ’» Quick Start

# Thinking mode (default) - for complex reasoning
ollama run richardyoung/qwen3-32b "Solve this step by step: A rectangular garden has a perimeter of 48 meters. If the length is 4 meters more than twice the width, find the dimensions."

# Non-thinking mode - for efficient dialogue
ollama run richardyoung/qwen3-32b "Write a quick summary of the main benefits of renewable energy"

πŸ› οΈ Example Use Cases

Complex Reasoning (Thinking Mode)

ollama run richardyoung/qwen3-32b "Walk through the proof of Fermat's Last Theorem for n=3, explaining each mathematical step"

Efficient Generation (Non-Thinking Mode)

ollama run richardyoung/qwen3-32b "Create a simple Python script to sort a list of dictionaries by multiple keys"

Multilingual Support

ollama run richardyoung/qwen3-32b "Summarize this article in Spanish, then provide the same summary in Mandarin Chinese and Arabic"

Agent Capabilities

ollama run richardyoung/qwen3-32b "Analyze this company's financial data and suggest three strategic recommendations with implementation steps"

Code Analysis

ollama run richardyoung/qwen3-32b "Review this machine learning code for potential improvements and explain the algorithmic complexity"

πŸ”§ Technical Specifications

  • Parameters: 32.8B total (31.2B non-embedding)
  • Architecture: Transformer with RoPE, SwiGLU, RMSNorm, Attention QKV bias
  • Layers: 64 transformer layers
  • Attention Heads: 64 Q heads, 8 KV heads (GQA)
  • Context Length: 32K native, 131K with YaRN scaling
  • Training: Pretraining + Post-training with advanced fine-tuning

βš™οΈ Advanced Configuration

Thinking Mode Settings

ollama run richardyoung/qwen3-32b \
  --temperature 0.6 \
  --top-p 0.95 \
  --top-k 20 \
  --min-p 0 \
  --enable-thinking true \
  "Provide a detailed step-by-step analysis of quantum entanglement"

Non-Thinking Mode Settings

ollama run richardyoung/qwen3-32b \
  --temperature 0.7 \
  --top-p 0.8 \
  --top-k 20 \
  --min-p 0 \
  --enable-thinking false \
  "Write a concise summary of blockchain technology"

Extended Context Usage

ollama run richardyoung/qwen3-32b \
  --context-length 131072 \
  --enable-thinking true \
  "Analyze this entire codebase and provide architectural recommendations"

πŸ“Š Language Support

Tier 1 Languages (Expert Level)

  • English: Native-level performance
  • Chinese (Simplified/Traditional): Exceptional understanding
  • Spanish: Advanced conversational and technical proficiency
  • French: Strong academic and business communication
  • German: Technical documentation and analysis

Tier 2 Languages (Advanced)

  • Japanese, Korean: Business and technical contexts
  • Russian, Arabic: Academic and professional usage
  • Portuguese, Italian: Native-level conversational ability
  • Dutch, Swedish: Professional and academic contexts

Specialized Domains

  • Code Comments: Programming documentation in 50+ languages
  • Legal Documents: Multilingual legal text understanding
  • Scientific Papers: Academic literature in major languages
  • Technical Manuals: Equipment and software documentation

πŸ’Ύ System Requirements

Minimum Requirements

  • RAM: 48GB (for efficient inference)
  • GPU: RTX 4090 or A100 40GB
  • Storage: 80GB free space

Recommended Setup

  • RAM: 64GB+
  • GPU: A100 80GB for optimal performance
  • Storage: 200GB NVMe SSD

Extended Context Setup

  • RAM: 96GB+ (for 131K context)
  • GPU: Multiple A100s or equivalent
  • Storage: 300GB+ NVMe SSD

🌟 What Makes This Model Special

  1. Dual-Mode Operation: Unique thinking/non-thinking capability in single model
  2. Extended Context: Native 32K with YaRN scaling to 131K tokens
  3. Multilingual Excellence: 100+ languages with cultural context awareness
  4. Agent Integration: Built-in tool calling and external system integration
  5. Reasoning Excellence: Superior performance on complex logical tasks

πŸ”„ Mode Switching

Static Mode Selection

  • Think for complex problems: Enable thinking from the start
  • Quick responses: Disable thinking for efficiency

Dynamic Mode Switching

# Add soft switches to user prompts
"/think" - Enable thinking mode for this turn
"/no_think" - Disable thinking mode for this turn

Best Practices

  • Complex Analysis: Always use thinking mode
  • Creative Writing: Both modes work well; choose based on style preference
  • Real-time Chat: Non-thinking mode for faster responses
  • Mathematical Proofs: Thinking mode essential
  • Translation: Non-thinking mode preferred

🎯 Agent Capabilities

Tool Integration

  • Function Calling: Precise API interactions
  • Database Queries: SQL and NoSQL integration
  • Web Scraping: Information gathering and processing
  • File Operations: Document analysis and generation

Complex Task Orchestration

  • Multi-step Workflows: Sequential task execution
  • Error Handling: Robust problem recovery
  • Context Maintenance: Long-term project understanding
  • Collaborative Planning: Team coordination features

⚠️ Usage Guidelines

Thinking Mode Recommendations

  • Complex Mathematics: Always enable thinking for step-by-step solutions
  • Code Generation: Use thinking for complex algorithms and architecture
  • Research Tasks: Enable thinking for thorough analysis
  • Legal/Medical: Thinking mode for careful, detailed responses

Non-Thinking Mode Recommendations

  • Real-time Chat: Faster responses for interactive applications
  • Simple Queries: Quick facts and straightforward requests
  • Creative Writing: Direct content generation without reasoning overhead
  • Translation: Efficient language conversion

πŸ—οΈ Integration Examples

OpenAI-Compatible API

# Works with existing OpenAI client libraries
import openai
client = openai.OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

Custom Applications

  • Chatbots: Dynamic mode switching based on query complexity
  • Content Creation: Thinking for complex topics, non-thinking for drafts
  • Educational Platforms: Adaptive reasoning based on student needs
  • Research Tools: Extended context for literature analysis

🀝 Support & Community

  • Official Documentation: Comprehensive guides and examples
  • Community Forums: Active support and discussions
  • Model Updates: Continuous improvements and optimizations
  • Integration Tools: SDKs for popular frameworks

πŸ“ License

This model follows the Apache 2.0 license. Free for commercial and personal use.

πŸ™ Acknowledgments

  • Qwen Team for exceptional model development
  • Alibaba Cloud for infrastructure and support
  • Open Source Community for testing and feedback
  • Ollama for seamless deployment and accessibility

Note: This model provides unprecedented flexibility with thinking modes. Experiment with both modes to find optimal performance for your specific use cases.

Performance Tip: Use thinking mode for anything involving mathematics, complex reasoning, or detailed analysis. Non-thinking mode excels at creative writing and efficient dialogue.