167 8 hours ago

Exceptional multilingual capabilities to elevate code engineering

tools cloud

Models

View all →

Readme

image.png

Get started

ollama run minimax-m2:cloud

Model Highlights

1. Multilingual Coding Excellence (Beyond Python)

While many models focus primarily on Python, real-world engineering requires cross-language proficiency. M2.1 delivers significant performance gains across Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, and JavaScript.

  • Web3 Dominance: Special optimization for Web3 protocols, offering superior performance for blockchain and decentralized projects.
  • Benchmark Leadership: Achieved 49.4% on Multi-SWE-Bench, surpassing industry leaders like Anthropic Claude 3.5 Sonnet and Gemini 1.5 Pro.
  • Deep Comprehension: Advanced code review capabilities, including sophisticated performance optimization and structural analysis.

2. Optimized for “Vibe” AppDev & Native Mobile

We have bridged the gap between aesthetic design and technical implementation.

  • Web & Scientific Simulation: Enhanced “web aesthetics” for better UI/UX generation and more realistic scientific scenario simulations.
  • Native Mobile Powerhouse: Addressing a common industry weakness, M2.1 significantly boosts native Android and iOS development capabilities.
  • “Not only vibe webdev, but vibe appdev.”

3. Concise, High-Efficiency Responses

Compared to the previous generation, MiniMax-M2.1 provides cleaner outputs and more streamlined Chain-of-Thought (CoT) reasoning. This reduction in “verbosity” results in a noticeably faster “feel” and near-instant response times for developer workflows.

4. Advanced Interleaved Thinking & Instruction Following

M2.1 is the first open-source model series to implement Advanced Interleaved Thinking, upgrading its systematic problem-solving capacity.

  • Complex Constraints: The model doesn’t just focus on code correctness; it excels at integrating “composite instruction constraints” (as seen in OctoCodingBench).
  • Office Readiness: These improvements make the model viable for complex administrative and office automation tasks (demonstrated in our Toolathlon showcase).

5. Enhanced Scaffolding & Agent Generalization

M2.1 is designed to be the “brain” behind your favorite tools. It shows exceptional performance across various programming agents and IDE extensions, including Claude Code, Droid (Factory AI), Cline, Kilo Code, and Roo Code.

  • Context Management: Seamless support for framework-specific configurations like Skill.md, Claude.md, agent.md, .cursorrules, and Slash Commands.

6. The Most Lightweight SOTA Model (10B Activated)

In just two months, we have achieved a massive leap in utility while maintaining the model’s legendary efficiency.

  • Efficiency Powerhouse: With only 10B activated parameters, M2.1 remains the most cost-effective SOTA-performance model in the open-source community.

7. High-Quality Dialogue & Creative Writing

M2.1 isn’t just a coding specialist—it’s a more capable all-around assistant. Compared to M2, the chat and writing experience has been significantly refined, delivering more nuanced, detailed, and contextually rich answers for non-technical queries.

Benchmarks

MiniMax-M2.1 delivers a significant leap over M2 on core software engineering leaderboards[cite: 14]. It shines particularly bright in multilingual scenarios, where it outperforms Claude Sonnet 4.5 and closely approaches Claude Opus 4.5.

Core Software Engineering Benchmarks

Benchmark SWE-bench Verified Multi-SWE-bench SWE-bench Multilingual Terminal-bench 2.0
MiniMax-M2.1 74.0 49.4 72.5 47.9
MiniMax-M2 69.4 36.2 56.5 30.0
Kimi K2 Thinking 71.3 41.9 61.1 35.2
DeepSeek V3.2 73.1 37.4 70.2 46.4
GLM 4.6 68.0 30.0 53.8 24.5
Claude Sonnet 4.5 77.2 44.3 68 ± 0.5 50.0
Claude Opus 4.5 80.9 50.0 77.5 ± 1.5 57.8
Gemini 3 Pro 78.0 38.0 65.0 54.2
GPT-5.2 (thinking) 80.0 x 72.0 54.0

Framework Generalization

Benchmark SWE-bench Verified (Claude Code) SWE-bench Verified (Droid) SWE-bench Verified (mini-swe-agent) SWT-bench SWE-Perf SWE-Review OctoCodingbench
MiniMax-M2.1 74.0 71.3 67.0 69.3 3.1 8.9 26.1
MiniMax-M2 69.4 68.1 61.0 32.8 1.4 3.4 13.3
Kimi K2 Thinking 71.3 64.0 63.4 38.2 1.0 5.3 16.8
DeepSeek V3.2 73.1 67.0 60.0 62.0 0.9 6.4 26.0
GLM 4.6 68.0 64.9 55.4 45.9 0.9 5.6 19.2
Claude Sonnet 4.5 77.2 72.3 70.6 69.5 3.0 10.5 22.8
Claude Opus 4.5 80.9 75.2 74.4 80.2 4.7 16.2 36.2
Gemini 3 Pro 78.0 x 71.8 79.7 6.5 x 22.9
GPT-5.2 (thinking) 80.0 x 74.2 80.7 3.6 x x

VIBE Benchmark (Visual & Interactive Benchmark for Execution)

Benchmark VIBE (Average) VIBE-Web VIBE-Simulation VIBE-Android VIBE-iOS VIBE-Backend
MiniMax-M2.1 88.6 91.5 87.1 89.7 88 86.7
MiniMax-M2 67.5 80.4 77 69.2 39.5 67.8
GLM 4.6 72.9 86.7 82.4 58.2 59.1 78.3
Claude Sonnet 4.5 85.2 87.3 79.1 87.5 81.2 90.8
Claude Opus 4.5 90.7 89.1 84 92.2 90 98
Gemini 3 Pro 82.4 89.5 89.2 78.7 75.8 78.7

Long-Horizon Tool Use & Intelligence Metric

Benchmark Toolathlon BrowseComp BrowseComp (context management) AA-Index
MiniMax-M2.1 43.5 47.4 62 64
MiniMax-M2 16.7 44 56.9 61
Kimi K2 Thinking 17.6 41.5 60.2 67
DeepSeek V3.2 35.2 51.4 67.6 66
GLM 4.6 18.8 45.1 50.2 56
Claude Sonnet 4.5 38.9 19.6 26.1 63
Claude Opus 4.5 43.5 37 57.8 70
Gemini 3 Pro 36.4 37.8 59.2 73
GPT-5.2 (thinking) 41.7 65.8 70 73