167 Downloads Updated 8 hours ago
ollama run minimax-m2:cloud
1. Multilingual Coding Excellence (Beyond Python)
While many models focus primarily on Python, real-world engineering requires cross-language proficiency. M2.1 delivers significant performance gains across Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, and JavaScript.
2. Optimized for “Vibe” AppDev & Native Mobile
We have bridged the gap between aesthetic design and technical implementation.
3. Concise, High-Efficiency Responses
Compared to the previous generation, MiniMax-M2.1 provides cleaner outputs and more streamlined Chain-of-Thought (CoT) reasoning. This reduction in “verbosity” results in a noticeably faster “feel” and near-instant response times for developer workflows.
4. Advanced Interleaved Thinking & Instruction Following
M2.1 is the first open-source model series to implement Advanced Interleaved Thinking, upgrading its systematic problem-solving capacity.
5. Enhanced Scaffolding & Agent Generalization
M2.1 is designed to be the “brain” behind your favorite tools. It shows exceptional performance across various programming agents and IDE extensions, including Claude Code, Droid (Factory AI), Cline, Kilo Code, and Roo Code.
Skill.md, Claude.md, agent.md, .cursorrules, and Slash Commands.6. The Most Lightweight SOTA Model (10B Activated)
In just two months, we have achieved a massive leap in utility while maintaining the model’s legendary efficiency.
7. High-Quality Dialogue & Creative Writing
M2.1 isn’t just a coding specialist—it’s a more capable all-around assistant. Compared to M2, the chat and writing experience has been significantly refined, delivering more nuanced, detailed, and contextually rich answers for non-technical queries.
MiniMax-M2.1 delivers a significant leap over M2 on core software engineering leaderboards[cite: 14]. It shines particularly bright in multilingual scenarios, where it outperforms Claude Sonnet 4.5 and closely approaches Claude Opus 4.5.
Core Software Engineering Benchmarks
| Benchmark | SWE-bench Verified | Multi-SWE-bench | SWE-bench Multilingual | Terminal-bench 2.0 |
|---|---|---|---|---|
| MiniMax-M2.1 | 74.0 | 49.4 | 72.5 | 47.9 |
| MiniMax-M2 | 69.4 | 36.2 | 56.5 | 30.0 |
| Kimi K2 Thinking | 71.3 | 41.9 | 61.1 | 35.2 |
| DeepSeek V3.2 | 73.1 | 37.4 | 70.2 | 46.4 |
| GLM 4.6 | 68.0 | 30.0 | 53.8 | 24.5 |
| Claude Sonnet 4.5 | 77.2 | 44.3 | 68 ± 0.5 | 50.0 |
| Claude Opus 4.5 | 80.9 | 50.0 | 77.5 ± 1.5 | 57.8 |
| Gemini 3 Pro | 78.0 | 38.0 | 65.0 | 54.2 |
| GPT-5.2 (thinking) | 80.0 | x | 72.0 | 54.0 |
Framework Generalization
| Benchmark | SWE-bench Verified (Claude Code) | SWE-bench Verified (Droid) | SWE-bench Verified (mini-swe-agent) | SWT-bench | SWE-Perf | SWE-Review | OctoCodingbench |
|---|---|---|---|---|---|---|---|
| MiniMax-M2.1 | 74.0 | 71.3 | 67.0 | 69.3 | 3.1 | 8.9 | 26.1 |
| MiniMax-M2 | 69.4 | 68.1 | 61.0 | 32.8 | 1.4 | 3.4 | 13.3 |
| Kimi K2 Thinking | 71.3 | 64.0 | 63.4 | 38.2 | 1.0 | 5.3 | 16.8 |
| DeepSeek V3.2 | 73.1 | 67.0 | 60.0 | 62.0 | 0.9 | 6.4 | 26.0 |
| GLM 4.6 | 68.0 | 64.9 | 55.4 | 45.9 | 0.9 | 5.6 | 19.2 |
| Claude Sonnet 4.5 | 77.2 | 72.3 | 70.6 | 69.5 | 3.0 | 10.5 | 22.8 |
| Claude Opus 4.5 | 80.9 | 75.2 | 74.4 | 80.2 | 4.7 | 16.2 | 36.2 |
| Gemini 3 Pro | 78.0 | x | 71.8 | 79.7 | 6.5 | x | 22.9 |
| GPT-5.2 (thinking) | 80.0 | x | 74.2 | 80.7 | 3.6 | x | x |
VIBE Benchmark (Visual & Interactive Benchmark for Execution)
| Benchmark | VIBE (Average) | VIBE-Web | VIBE-Simulation | VIBE-Android | VIBE-iOS | VIBE-Backend |
|---|---|---|---|---|---|---|
| MiniMax-M2.1 | 88.6 | 91.5 | 87.1 | 89.7 | 88 | 86.7 |
| MiniMax-M2 | 67.5 | 80.4 | 77 | 69.2 | 39.5 | 67.8 |
| GLM 4.6 | 72.9 | 86.7 | 82.4 | 58.2 | 59.1 | 78.3 |
| Claude Sonnet 4.5 | 85.2 | 87.3 | 79.1 | 87.5 | 81.2 | 90.8 |
| Claude Opus 4.5 | 90.7 | 89.1 | 84 | 92.2 | 90 | 98 |
| Gemini 3 Pro | 82.4 | 89.5 | 89.2 | 78.7 | 75.8 | 78.7 |
Long-Horizon Tool Use & Intelligence Metric
| Benchmark | Toolathlon | BrowseComp | BrowseComp (context management) | AA-Index |
|---|---|---|---|---|
| MiniMax-M2.1 | 43.5 | 47.4 | 62 | 64 |
| MiniMax-M2 | 16.7 | 44 | 56.9 | 61 |
| Kimi K2 Thinking | 17.6 | 41.5 | 60.2 | 67 |
| DeepSeek V3.2 | 35.2 | 51.4 | 67.6 | 66 |
| GLM 4.6 | 18.8 | 45.1 | 50.2 | 56 |
| Claude Sonnet 4.5 | 38.9 | 19.6 | 26.1 | 63 |
| Claude Opus 4.5 | 43.5 | 37 | 57.8 | 70 |
| Gemini 3 Pro | 36.4 | 37.8 | 59.2 | 73 |
| GPT-5.2 (thinking) | 41.7 | 65.8 | 70 | 73 |