391 Downloads Updated 6 months ago
HuggingFace.co link We introduce two new state-of-the-art models for local intelligence, on-device computing, and at-the-edge use cases. We call them les Ministraux: Ministral 3B and Ministral 8B.
The Ministral-8B-Instruct-2410 Language Model is an instruct fine-tuned model significantly outperforming existing models of similar size, released under the Mistral Research License.
If you are interested in using Ministral-3B or Ministral-8B commercially, outperforming Mistral-7B, reach out to us.
For more details about les Ministraux please refer to our release blog post.
<s>[INST]user message[/INST]assistant response</s>[INST]new user message[/INST]
Feature | Value |
---|---|
Architecture | Dense Transformer |
Parameters | 8,019,808,256 |
Layers | 36 |
Heads | 32 |
Dim | 4096 |
KV Heads (GQA) | 8 |
Hidden Dim | 12288 |
Head Dim | 128 |
Vocab Size | 131,072 |
Context Length | 128k |
Attention Pattern | Ragged (128k,32k,32k,32k) |
Knowledge & Commonsense
Model | MMLU | AGIEval | Winogrande | Arc-c | TriviaQA |
---|---|---|---|---|---|
Mistral 7B Base | 62.5 | 42.5 | 74.2 | 67.9 | 62.5 |
Llama 3.1 8B Base | 64.7 | 44.4 | 74.6 | 46.0 | 60.2 |
Ministral 8B Base | 65.0 | 48.3 | 75.3 | 71.9 | 65.5 |
Gemma 2 2B Base | 52.4 | 33.8 | 68.7 | 42.6 | 47.8 |
Llama 3.2 3B Base | 56.2 | 37.4 | 59.6 | 43.1 | 50.7 |
Ministral 3B Base | 60.9 | 42.1 | 72.7 | 64.2 | 56.7 |
Code & Math
Model | HumanEval pass@1 | GSM8K maj@8 |
---|---|---|
Mistral 7B Base | 26.8 | 32.0 |
Llama 3.1 8B Base | 37.8 | 42.2 |
Ministral 8B Base | 34.8 | 64.5 |
Gemma 2 2B | 20.1 | 35.5 |
Llama 3.2 3B | 14.6 | 33.5 |
Ministral 3B | 34.2 | 50.9 |
Multilingual
Model | French MMLU | German MMLU | Spanish MMLU |
---|---|---|---|
Mistral 7B Base | 50.6 | 49.6 | 51.4 |
Llama 3.1 8B Base | 50.8 | 52.8 | 54.6 |
Ministral 8B Base | 57.5 | 57.4 | 59.6 |
Gemma 2 2B Base | 41.0 | 40.1 | 41.7 |
Llama 3.2 3B Base | 42.3 | 42.2 | 43.1 |
Ministral 3B Base | 49.1 | 48.3 | 49.5 |
Chat/Arena (gpt-4o judge)
Model | MTBench | Arena Hard | Wild bench |
---|---|---|---|
Mistral 7B Instruct v0.3 | 6.7 | 44.3 | 33.1 |
Llama 3.1 8B Instruct | 7.5 | 62.4 | 37.0 |
Gemma 2 9B Instruct | 7.6 | 68.7 | 43.8 |
Ministral 8B Instruct | 8.3 | 70.9 | 41.3 |
Gemma 2 2B Instruct | 7.5 | 51.7 | 32.5 |
Llama 3.2 3B Instruct | 7.2 | 46.0 | 27.2 |
Ministral 3B Instruct | 8.1 | 64.3 | 36.3 |
Code & Math
Model | MBPP pass@1 | HumanEval pass@1 | Math maj@1 |
---|---|---|---|
Mistral 7B Instruct v0.3 | 50.2 | 38.4 | 13.2 |
Gemma 2 9B Instruct | 68.5 | 67.7 | 47.4 |
Llama 3.1 8B Instruct | 69.7 | 67.1 | 49.3 |
Ministral 8B Instruct | 70.0 | 76.8 | 54.5 |
Gemma 2 2B Instruct | 54.5 | 42.7 | 22.8 |
Llama 3.2 3B Instruct | 64.6 | 61.0 | 38.4 |
Ministral 3B Instruct | 67.7 | 77.4 | 51.7 |
Function calling
Model | Internal bench |
---|---|
Mistral 7B Instruct v0.3 | 6.9 |
Llama 3.1 8B Instruct | N/A |
Gemma 2 9B Instruct | N/A |
Ministral 8B Instruct | 31.6 |
Gemma 2 2B Instruct | N/A |
Llama 3.2 3B Instruct | N/A |
Ministral 3B Instruct | 28.4 |