Mistral-Small-3.1-24B-Instruct-2503 is a quantized GGUF variant of Mistral-small 3.1 24B Instruct (2503), optimized to run smoothly on GPUs with as low as 16GB VRAM.

Mistral-Small-3.1-24B-Instruct-2503 (GGUF)

This repository provides the “Mistral-Small-3.1-24B-Instruct-2503” model converted into the GGUF format (Q4_K_L or Q4_K_M quantized version) for easy and efficient deployment via Ollama.

The original GGUF image was made available by bartowski on Hugging Face.

Model Features

Base Model: Mistral-Small-3.1-24B-Instruct (variant 2503)
Quantization: GGUF Q4_K_L quantization format, significantly reducing resource requirements while preserving high-quality inference.
Low VRAM Usage: Specifically optimized to run effectively on GPUs with limited VRAM resources of around 16GB.
Enhanced Reliability: Adjustable sampling temperature to ensure stable and predictable generation results.
Native Tool Calling Capability: Integrated support for tool calling functionality, enabling the model to seamlessly utilize external instructions and APIs.

Ideal Use Cases

This model is perfectly suited for:

Local inference on consumer-grade GPUs.
Integration in projects requiring advanced instructions and command-following tasks.
Environments where controlled inference quality and predictable responses are essential.
Leveraging tool calling capabilities for greater interactivity and functionality.

Acknowledgements

Special thanks to Mistral.ai for providing open access to high-quality foundational models, making the existence of derivatives such as this one possible. We also express gratitude to bartowski (on Hugging Face) for the initial GGUF quantization and distribution efforts.

Mistral-Small-3.1-24B-Instruct-2503 is a quantized GGUF variant of Mistral-small 3.1 24B Instruct (2503), optimized to run smoothly on GPUs with as low as 16GB VRAM.

Readme

Mistral-Small-3.1-24B-Instruct-2503 (GGUF)

Model Features

Ideal Use Cases

Acknowledgements