minimax-m2.5

MiniMax-M2.5 is a state-of-the-art large language model designed for real-world productivity and coding tasks.

Applications

Claude Code ollama launch claude --model minimax-m2.5:cloud

Codex ollama launch codex --model minimax-m2.5:cloud

OpenCode ollama launch opencode --model minimax-m2.5:cloud

OpenClaw ollama launch openclaw --model minimax-m2.5:cloud

MiniMax M2.5 is the fastest-improving model series for coding and agentic workflows, trained with large-scale reinforcement learning across hundreds of thousands of real-world environments.

Highlights

Coding. Trained on large-scale RL across 10+ languages (Python, Go, C, C++, TypeScript, Rust, Kotlin, Java, JavaScript, PHP, Lua, Dart, Ruby) and hundreds of thousands of real environments, M2.5 develops native “spec behavior”—planning architecture, structure, and design before writing code. It handles the full development lifecycle: system design from zero, environment setup, feature iteration, code review, and testing across Web, Android, iOS, Windows, and Mac.
Leading agent performance. M2.5 achieves 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp—reaching a level comparable to the Claude Opus series on coding tasks. In search and tool-use tasks, the model demonstrates higher decision maturity—solving problems in fewer rounds with better token efficiency, saving roughly 20% in tool-call rounds compared to M2.1.
37% faster task completion. Through improved task decomposition and more efficient chain-of-thought reasoning, M2.5 completes complex agentic tasks significantly faster. On SWE-Bench Verified, end-to-end runtime dropped from 31.3 minutes (M2.1) to 22.8 minutes—on par with Claude Opus 4.6’s 22.9 minutes—while also consuming fewer tokens per task (3.52M vs 3.72M).

Benchmarks

Coding

M2.5 represents a significant step up from M2.1, reaching performance comparable to the Claude Opus series on core software engineering tasks.

MiniMax tested performance on the SWE-Bench Verified evaluation set using different coding agent harnesses.

On Droid: 79.7(M2.5) > 78.9(Opus 4.6)

On OpenCode: 76.1(M2.5) > 75.9(Opus 4.6)

Search and tool calling

Office & Productivity

M2.5 was trained in collaboration with domain experts in finance, law, and social sciences to produce genuinely deliverable outputs. In evaluations on advanced office tasks—including Word documents, PowerPoint presentations, and Excel financial modeling—M2.5 achieved a 59.0% average win rate against mainstream models using pairwise comparison (GDPval-MM framework).

Additional internal benchmarks include MEWC (based on Microsoft Excel World Championship problems, 2021–2026) and Finance Modeling (expert-constructed financial modeling tasks scored via rubrics). Detailed scores for these benchmarks are available as charts in the original blog post.

Reference

MiniMax M2.5 announcement