871 1 year ago

Alternative quantization levels, no fine-tuning

tools
ollama run freerainboxbox/mistral-small:24b-instruct-2501-q3_K_S

Applications

Claude Code
Claude Code ollama launch claude --model freerainboxbox/mistral-small:24b-instruct-2501-q3_K_S
OpenClaw
OpenClaw ollama launch openclaw --model freerainboxbox/mistral-small:24b-instruct-2501-q3_K_S
Hermes Agent
Hermes Agent ollama launch hermes --model freerainboxbox/mistral-small:24b-instruct-2501-q3_K_S
Codex
Codex ollama launch codex --model freerainboxbox/mistral-small:24b-instruct-2501-q3_K_S
OpenCode
OpenCode ollama launch opencode --model freerainboxbox/mistral-small:24b-instruct-2501-q3_K_S

Models

View all →

11 models

mistral-small:24b-instruct-2501-q3_K_S

10GB · 32K context window · Text · 1 year ago

mistral-small:24b-instruct-2501-q3_K_M

11GB · 32K context window · Text · 1 year ago

mistral-small:24b-instruct-2501-q3_K_L

12GB · 32K context window · Text · 1 year ago

mistral-small:24b-instruct-2501-q4_0

13GB · 32K context window · Text · 1 year ago

mistral-small:24b-instruct-2501-q4_1

15GB · 32K context window · Text · 1 year ago

mistral-small:24b-instruct-2501-q4_K_S

14GB · 32K context window · Text · 1 year ago

mistral-small:24b-instruct-2501-q5_0

16GB · 32K context window · Text · 1 year ago

mistral-small:24b-instruct-2501-q5_1

18GB · 32K context window · Text · 1 year ago

mistral-small:24b-instruct-2501-q5_K_S

16GB · 32K context window · Text · 1 year ago

mistral-small:24b-instruct-2501-q5_K_M

17GB · 32K context window · Text · 1 year ago

mistral-small:24b-instruct-2501-q6_K

19GB · 32K context window · Text · 1 year ago

Readme

These are alternative quantization levels from Mistral’s new 24B Mistral Small 3. No fine-tuning has been done, these are purely quantized.

Benchmarks on M1 Max (64GB):

Quant Tok/sec
Q8_0 13.39597190567003
Q6_K 12.196783864813302
Q5_K_M 13.346122678485786
Q5_K_S 13.907560335445874
Q5_1 15.163411522229856
Q5_0 15.23285945396498
Q4_K_M 17.98863875447086
Q4_K_S 20.048530172242334
Q4_1 20.496397117694155
Q4_0 22.094949324798563
Q3_K_L 14.348439705190527
Q3_K_M 16.1832971338529
Q3_K_S 14.962143973080158

Easy prompts that are tolerant to potential mistakes should run Q4_0. For balanced quality with decent speed, use Q4_K_M. Avoid Q6_K.