761 Downloads Updated 9 months ago
Updated 9 months ago
9 months ago
537af93c11da · 15GB ·
These are alternative quantization levels from Mistral’s new 24B Mistral Small 3. No fine-tuning has been done, these are purely quantized.
Benchmarks on M1 Max (64GB):
| Quant | Tok/sec |
|---|---|
Q8_0 |
13.39597190567003 |
Q6_K |
12.196783864813302 |
Q5_K_M |
13.346122678485786 |
Q5_K_S |
13.907560335445874 |
Q5_1 |
15.163411522229856 |
Q5_0 |
15.23285945396498 |
Q4_K_M |
17.98863875447086 |
Q4_K_S |
20.048530172242334 |
Q4_1 |
20.496397117694155 |
Q4_0 |
22.094949324798563 |
Q3_K_L |
14.348439705190527 |
Q3_K_M |
16.1832971338529 |
Q3_K_S |
14.962143973080158 |
Easy prompts that are tolerant to potential mistakes should run Q4_0. For balanced quality with decent speed, use Q4_K_M. Avoid Q6_K.