391 6 months ago

The Ministral-8B-Instruct-2410 Language Model is an instruct fine-tuned model significantly outperforming existing models of similar size, released under the Mistral Research License.

Models

View all →

Readme

Ministral 8B Instruct

HuggingFace.co link We introduce two new state-of-the-art models for local intelligence, on-device computing, and at-the-edge use cases. We call them les Ministraux: Ministral 3B and Ministral 8B.

The Ministral-8B-Instruct-2410 Language Model is an instruct fine-tuned model significantly outperforming existing models of similar size, released under the Mistral Research License.

If you are interested in using Ministral-3B or Ministral-8B commercially, outperforming Mistral-7B, reach out to us.

For more details about les Ministraux please refer to our release blog post.

Basic Instruct Template (V3-Tekken)

<s>[INST]user message[/INST]assistant response</s>[INST]new user message[/INST]

Ministral 8B Architecture

Feature Value
Architecture Dense Transformer
Parameters 8,019,808,256
Layers 36
Heads 32
Dim 4096
KV Heads (GQA) 8
Hidden Dim 12288
Head Dim 128
Vocab Size 131,072
Context Length 128k
Attention Pattern Ragged (128k,32k,32k,32k)

Benchmarks

Base Models

Knowledge & Commonsense

Model MMLU AGIEval Winogrande Arc-c TriviaQA
Mistral 7B Base 62.5 42.5 74.2 67.9 62.5
Llama 3.1 8B Base 64.7 44.4 74.6 46.0 60.2
Ministral 8B Base 65.0 48.3 75.3 71.9 65.5
Gemma 2 2B Base 52.4 33.8 68.7 42.6 47.8
Llama 3.2 3B Base 56.2 37.4 59.6 43.1 50.7
Ministral 3B Base 60.9 42.1 72.7 64.2 56.7

Code & Math

Model HumanEval pass@1 GSM8K maj@8
Mistral 7B Base 26.8 32.0
Llama 3.1 8B Base 37.8 42.2
Ministral 8B Base 34.8 64.5
Gemma 2 2B 20.1 35.5
Llama 3.2 3B 14.6 33.5
Ministral 3B 34.2 50.9

Multilingual

Model French MMLU German MMLU Spanish MMLU
Mistral 7B Base 50.6 49.6 51.4
Llama 3.1 8B Base 50.8 52.8 54.6
Ministral 8B Base 57.5 57.4 59.6
Gemma 2 2B Base 41.0 40.1 41.7
Llama 3.2 3B Base 42.3 42.2 43.1
Ministral 3B Base 49.1 48.3 49.5

Instruct Models

Chat/Arena (gpt-4o judge)

Model MTBench Arena Hard Wild bench
Mistral 7B Instruct v0.3 6.7 44.3 33.1
Llama 3.1 8B Instruct 7.5 62.4 37.0
Gemma 2 9B Instruct 7.6 68.7 43.8
Ministral 8B Instruct 8.3 70.9 41.3
Gemma 2 2B Instruct 7.5 51.7 32.5
Llama 3.2 3B Instruct 7.2 46.0 27.2
Ministral 3B Instruct 8.1 64.3 36.3

Code & Math

Model MBPP pass@1 HumanEval pass@1 Math maj@1
Mistral 7B Instruct v0.3 50.2 38.4 13.2
Gemma 2 9B Instruct 68.5 67.7 47.4
Llama 3.1 8B Instruct 69.7 67.1 49.3
Ministral 8B Instruct 70.0 76.8 54.5
Gemma 2 2B Instruct 54.5 42.7 22.8
Llama 3.2 3B Instruct 64.6 61.0 38.4
Ministral 3B Instruct 67.7 77.4 51.7

Function calling

Model Internal bench
Mistral 7B Instruct v0.3 6.9
Llama 3.1 8B Instruct N/A
Gemma 2 9B Instruct N/A
Ministral 8B Instruct 31.6
Gemma 2 2B Instruct N/A
Llama 3.2 3B Instruct N/A
Ministral 3B Instruct 28.4