freerainboxbox/
mistral-small:24b-instruct-2501-q4_1

761 9 months ago

Alternative quantization levels, no fine-tuning

tools

9 months ago

537af93c11da · 15GB ·

llama
·
23.6B
·
Q4_1
{{- range $index, $_ := .Messages }} {{- if eq .Role "system" }}[SYSTEM_PROMPT]{{ .Content }}[/SYSTE
You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headqu
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{ "temperature": 0.15 }

Readme

These are alternative quantization levels from Mistral’s new 24B Mistral Small 3. No fine-tuning has been done, these are purely quantized.

Benchmarks on M1 Max (64GB):

Quant Tok/sec
Q8_0 13.39597190567003
Q6_K 12.196783864813302
Q5_K_M 13.346122678485786
Q5_K_S 13.907560335445874
Q5_1 15.163411522229856
Q5_0 15.23285945396498
Q4_K_M 17.98863875447086
Q4_K_S 20.048530172242334
Q4_1 20.496397117694155
Q4_0 22.094949324798563
Q3_K_L 14.348439705190527
Q3_K_M 16.1832971338529
Q3_K_S 14.962143973080158

Easy prompts that are tolerant to potential mistakes should run Q4_0. For balanced quality with decent speed, use Q4_K_M. Avoid Q6_K.