1.6M Downloads Updated 7 months ago
Updated 11 months ago
11 months ago
04c315f44b8c · 15GB
Mistral Small 3 sets a new benchmark in the “small” Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models.
Mistral Small can be deployed locally and is exceptionally “knowledge-dense”, fitting in a single RTX 4090 or a 32GB RAM MacBook once quantized. Perfect for:
We conducted side by side evaluations with an external third-party vendor, on a set of over 1k proprietary coding and generalist prompts. Evaluators were tasked with selecting their preferred model response from anonymized generations produced by Mistral Small 3 vs another model. We are aware that in some cases the benchmarks on human judgement starkly differ from publicly available benchmarks, but have taken extra caution in verifying a fair evaluation. We are confident that the above benchmarks are valid.
Our instruction tuned model performs competitively with open weight models three times its size and with proprietary GPT4o-mini model across Code, Math, General knowledge and Instruction following benchmarks.
Performance accuracy on all benchmarks were obtained through the same internal evaluation pipeline - as such, numbers may vary slightly from previously reported performance (Qwen2.5-32B-Instruct, Llama-3.3-70B-Instruct, Gemma-2-27B-IT). Judge based evals such as Wildbench, Arena hard and MTBench were based on gpt-4o-2024-05-13.
Customers are evaluating Mistral Small 3 across multiple industries, including: