761 9 months ago

Alternative quantization levels, no fine-tuning

tools

Models

View all →

11 models

mistral-small:24b-instruct-2501-q3_K_S

10GB · 32K context window · Text · 9 months ago

mistral-small:24b-instruct-2501-q3_K_M

11GB · 32K context window · Text · 9 months ago

mistral-small:24b-instruct-2501-q3_K_L

12GB · 32K context window · Text · 9 months ago

mistral-small:24b-instruct-2501-q4_0

13GB · 32K context window · Text · 9 months ago

mistral-small:24b-instruct-2501-q4_1

15GB · 32K context window · Text · 9 months ago

mistral-small:24b-instruct-2501-q4_K_S

14GB · 32K context window · Text · 9 months ago

mistral-small:24b-instruct-2501-q5_0

16GB · 32K context window · Text · 9 months ago

mistral-small:24b-instruct-2501-q5_1

18GB · 32K context window · Text · 9 months ago

mistral-small:24b-instruct-2501-q5_K_S

16GB · 32K context window · Text · 9 months ago

mistral-small:24b-instruct-2501-q5_K_M

17GB · 32K context window · Text · 9 months ago

mistral-small:24b-instruct-2501-q6_K

19GB · 32K context window · Text · 9 months ago

Readme

These are alternative quantization levels from Mistral’s new 24B Mistral Small 3. No fine-tuning has been done, these are purely quantized.

Benchmarks on M1 Max (64GB):

Quant Tok/sec
Q8_0 13.39597190567003
Q6_K 12.196783864813302
Q5_K_M 13.346122678485786
Q5_K_S 13.907560335445874
Q5_1 15.163411522229856
Q5_0 15.23285945396498
Q4_K_M 17.98863875447086
Q4_K_S 20.048530172242334
Q4_1 20.496397117694155
Q4_0 22.094949324798563
Q3_K_L 14.348439705190527
Q3_K_M 16.1832971338529
Q3_K_S 14.962143973080158

Easy prompts that are tolerant to potential mistakes should run Q4_0. For balanced quality with decent speed, use Q4_K_M. Avoid Q6_K.