123 1 month ago

It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3.1 the vision encoder was removed.

vision tools

Models

View all →

Readme

Devstral Small 1.1

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this benchmark.

It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3.1 the vision encoder was removed.

For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.

Learn more about Devstral in our blog post.

Updates compared to Devstral Small 1.0: - Improved performance, please refer to the benchmark results. - Devstral Small 1.1 is still great when paired with OpenHands. This new version also generalizes better to other prompts and coding environments. - Supports Mistral’s function calling format.

Key Features:

  • Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.
  • lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use.
  • Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes.
  • Context Window: A 128k context window.
  • Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

Benchmark Results

SWE-Bench

Devstral Small 1.1 achieves a score of 53.6% on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%.

Model Agentic Scaffold SWE-Bench Verified (%)
Devstral Small 1.1 OpenHands Scaffold 53.6
Devstral Small 1.0 OpenHands Scaffold 46.8
GPT-4.1-mini OpenAI Scaffold 23.6
Claude 3.5 Haiku Anthropic Scaffold 40.6
SWE-smith-LM 32B SWE-agent Scaffold 40.2
Skywork SWE OpenHands Scaffold 38.0
DeepSWE R2E-Gym Scaffold 42.2

When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B.

SWE Benchmark