713 Downloads Updated 1 year ago
Name
6 models
fietje-2b-chat:Q3_K_M
1.4GB · 2K context window · Text · 1 year ago
fietje-2b-chat:Q4_K_M
1.7GB · 2K context window · Text · 1 year ago
fietje-2b-chat:Q5_K_M
2.0GB · 2K context window · Text · 1 year ago
fietje-2b-chat:Q6_K
2.3GB · 2K context window · Text · 1 year ago
fietje-2b-chat:Q8_0
3.0GB · 2K context window · Text · 1 year ago
fietje-2b-chat:f16
5.6GB · 2K context window · Text · 1 year ago
This repository contains quantized versions of BramVanroy/fietje-2b-chat.
Available quantization types and expected performance differences compared to base f16, higher perplexity=worse (from llama.cpp):
Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B
Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B
Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B
Q6_K : 5.15G, +0.0008 ppl @ LLaMA-v1-7B
Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B
F16 : 13.00G @ 7B
Quants were made with release b2777 of llama.cpp.