833 Downloads Updated 9 months ago
Mistral Small 3.1: the best model in its weight class.
Building on Mistral Small 3, this new model comes with improved text performance, multimodal understanding, and an expanded context window of up to 128k tokens. The model outperforms comparable models like Gemma 3 and GPT-4o Mini, while delivering inference speeds of 150 tokens per second.
Mistral Small 3.1 is released under an Apache 2.0 license.
When available, we report numbers previously published by other model providers, otherwise we re-evaluate them using our own evaluation harness.
| Model | MMLU (5-shot) | MMLU Pro (5-shot CoT) | TriviaQA | GPQA Main (5-shot CoT) | MMMU |
|---|---|---|---|---|---|
| Small 3.1 24B Base | 81.01% | 56.03% | 80.50% | 37.50% | 59.27% |
| Gemma 3 27B PT | 78.60% | 52.20% | 81.30% | 24.30% | 56.10% |
| Model | MMLU | MMLU Pro (5-shot CoT) | MATH | GPQA Main (5-shot CoT) | GPQA Diamond (5-shot CoT ) | MBPP | HumanEval | SimpleQA (TotalAcc) |
|---|---|---|---|---|---|---|---|---|
| Small 3.1 24B Instruct | 80.62% | 66.76% | 69.30% | 44.42% | 45.96% | 74.71% | 88.41% | 10.43% |
| Gemma 3 27B IT | 76.90% | 67.50% | 89.00% | 36.83% | 42.40% | 74.40% | 87.80% | 10.00% |
| GPT4o Mini | 82.00% | 61.70% | 70.20% | 40.20% | 39.39% | 84.82% | 87.20% | 9.50% |
| Claude 3.5 Haiku | 77.60% | 65.00% | 69.20% | 37.05% | 41.60% | 85.60% | 88.10% | 8.02% |
| Cohere Aya-Vision 32B | 72.14% | 47.16% | 41.98% | 34.38% | 33.84% | 70.43% | 62.20% | 7.65% |
| Model | MMMU | MMMU PRO | Mathvista | ChartQA | DocVQA | AI2D | MM MT Bench |
|---|---|---|---|---|---|---|---|
| Small 3.1 24B Instruct | 64.00% | 49.25% | 68.91% | 86.24% | 94.08% | 93.72% | 7.3 |
| Gemma 3 27B IT | 64.90% | 48.38% | 67.60% | 76.00% | 86.60% | 84.50% | 7 |
| GPT4o Mini | 59.40% | 37.60% | 56.70% | 76.80% | 86.70% | 88.10% | 6.6 |
| Claude 3.5 Haiku | 60.50% | 45.03% | 61.60% | 87.20% | 90.00% | 92.10% | 6.5 |
| Cohere Aya-Vision 32B | 48.20% | 31.50% | 50.10% | 63.04% | 72.40% | 82.57% | 4.1 |
| Model | Average | European | East Asian | Middle Eastern |
|---|---|---|---|---|
| Small 3.1 24B Instruct | 71.18% | 75.30% | 69.17% | 69.08% |
| Gemma 3 27B IT | 70.19% | 74.14% | 65.65% | 70.76% |
| GPT4o Mini | 70.36% | 74.21% | 65.96% | 70.90% |
| Claude 3.5 Haiku | 70.16% | 73.45% | 67.05% | 70.00% |
| Cohere Aya-Vision 32B | 62.15% | 64.70% | 57.61% | 64.12% |
| Model | LongBench v2 | RULER 32K | RULER 128K |
|---|---|---|---|
| Small 3.1 24B Instruct | 37.18% | 93.96% | 81.20% |
| Gemma 3 27B IT | 34.59% | 91.10% | 66.00% |
| GPT4o Mini | 29.30% | 90.20% | 65.8% |
| Claude 3.5 Haiku | 35.19% | 92.60% | 91.90% |