417 Downloads Updated 8 months ago
HuggingFace.co link We introduce two new state-of-the-art models for local intelligence, on-device computing, and at-the-edge use cases. We call them les Ministraux: Ministral 3B and Ministral 8B.
The Ministral-8B-Instruct-2410 Language Model is an instruct fine-tuned model significantly outperforming existing models of similar size, released under the Mistral Research License.
If you are interested in using Ministral-3B or Ministral-8B commercially, outperforming Mistral-7B, reach out to us.
For more details about les Ministraux please refer to our release blog post.
<s>[INST]user message[/INST]assistant response</s>[INST]new user message[/INST]
| Feature | Value |
|---|---|
| Architecture | Dense Transformer |
| Parameters | 8,019,808,256 |
| Layers | 36 |
| Heads | 32 |
| Dim | 4096 |
| KV Heads (GQA) | 8 |
| Hidden Dim | 12288 |
| Head Dim | 128 |
| Vocab Size | 131,072 |
| Context Length | 128k |
| Attention Pattern | Ragged (128k,32k,32k,32k) |
Knowledge & Commonsense
| Model | MMLU | AGIEval | Winogrande | Arc-c | TriviaQA |
|---|---|---|---|---|---|
| Mistral 7B Base | 62.5 | 42.5 | 74.2 | 67.9 | 62.5 |
| Llama 3.1 8B Base | 64.7 | 44.4 | 74.6 | 46.0 | 60.2 |
| Ministral 8B Base | 65.0 | 48.3 | 75.3 | 71.9 | 65.5 |
| Gemma 2 2B Base | 52.4 | 33.8 | 68.7 | 42.6 | 47.8 |
| Llama 3.2 3B Base | 56.2 | 37.4 | 59.6 | 43.1 | 50.7 |
| Ministral 3B Base | 60.9 | 42.1 | 72.7 | 64.2 | 56.7 |
Code & Math
| Model | HumanEval pass@1 | GSM8K maj@8 |
|---|---|---|
| Mistral 7B Base | 26.8 | 32.0 |
| Llama 3.1 8B Base | 37.8 | 42.2 |
| Ministral 8B Base | 34.8 | 64.5 |
| Gemma 2 2B | 20.1 | 35.5 |
| Llama 3.2 3B | 14.6 | 33.5 |
| Ministral 3B | 34.2 | 50.9 |
Multilingual
| Model | French MMLU | German MMLU | Spanish MMLU |
|---|---|---|---|
| Mistral 7B Base | 50.6 | 49.6 | 51.4 |
| Llama 3.1 8B Base | 50.8 | 52.8 | 54.6 |
| Ministral 8B Base | 57.5 | 57.4 | 59.6 |
| Gemma 2 2B Base | 41.0 | 40.1 | 41.7 |
| Llama 3.2 3B Base | 42.3 | 42.2 | 43.1 |
| Ministral 3B Base | 49.1 | 48.3 | 49.5 |
Chat/Arena (gpt-4o judge)
| Model | MTBench | Arena Hard | Wild bench |
|---|---|---|---|
| Mistral 7B Instruct v0.3 | 6.7 | 44.3 | 33.1 |
| Llama 3.1 8B Instruct | 7.5 | 62.4 | 37.0 |
| Gemma 2 9B Instruct | 7.6 | 68.7 | 43.8 |
| Ministral 8B Instruct | 8.3 | 70.9 | 41.3 |
| Gemma 2 2B Instruct | 7.5 | 51.7 | 32.5 |
| Llama 3.2 3B Instruct | 7.2 | 46.0 | 27.2 |
| Ministral 3B Instruct | 8.1 | 64.3 | 36.3 |
Code & Math
| Model | MBPP pass@1 | HumanEval pass@1 | Math maj@1 |
|---|---|---|---|
| Mistral 7B Instruct v0.3 | 50.2 | 38.4 | 13.2 |
| Gemma 2 9B Instruct | 68.5 | 67.7 | 47.4 |
| Llama 3.1 8B Instruct | 69.7 | 67.1 | 49.3 |
| Ministral 8B Instruct | 70.0 | 76.8 | 54.5 |
| Gemma 2 2B Instruct | 54.5 | 42.7 | 22.8 |
| Llama 3.2 3B Instruct | 64.6 | 61.0 | 38.4 |
| Ministral 3B Instruct | 67.7 | 77.4 | 51.7 |
Function calling
| Model | Internal bench |
|---|---|
| Mistral 7B Instruct v0.3 | 6.9 |
| Llama 3.1 8B Instruct | N/A |
| Gemma 2 9B Instruct | N/A |
| Ministral 8B Instruct | 31.6 |
| Gemma 2 2B Instruct | N/A |
| Llama 3.2 3B Instruct | N/A |
| Ministral 3B Instruct | 28.4 |