zendar79/ qwen3:4b-q4_0

767 7 months ago

I've started from Qwen/Qwen3-4B-Instruct-2507 fp16 and quantised it

tools
ollama run zendar79/qwen3:4b-q4_0

Details

7 months ago

ad253fce0c56 · 2.4GB ·

qwen3
·
4.02B
·
Q4_0
{{ if .Messages }} {{- if or .System .Tools }}<|im_start|>system {{ .System }} {{- if .Tools }} # To
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
{ "repeat_penalty": 1.05, "temperature": 0.7, "top_k": 20, "top_p": 0.8 }

Readme

These are the steps I followed

Download from Hugging-Face CLI

pip install -U huggingface_hub
huggingface-cli download Qwen/Qwen3-4B-Instruct-2507 \
        --local-dir ./Qwen3-4B-Instruct-2507 \
        --exclude "*.git*" "README.md" ".gitattributes"

Produce a full-precision GGUF

python convert_hf_to_gguf.py ./Qwen3-4B-Instruct-2507 \
        --outfile ./qwen3-4b-f16.gguf \
        --outtype f16

Get the official llama.cpp repo

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
pip install -r requirements.txt

cmake -B build
cmake --build build --config Release

Or you can avoid this step and download the proper release to get the scripts

Do quantisation

./llama-quantize ./qwen3-4b-f16.gguf ./qwen3-4b-q4_k_m.gguf q4_k_m

./llama-quantize ./qwen3-4b-f16.gguf ./qwen3-4b-q4_0.gguf q4_0

Use with ollama

see this page to see the template format and how to import it on Ollama