767 7 months ago

I've started from Qwen/Qwen3-4B-Instruct-2507 fp16 and quantised it

tools
ollama run zendar79/qwen3:4b-q4km

Applications

Claude Code
Claude Code ollama launch claude --model zendar79/qwen3:4b-q4km
Codex App
Codex App ollama launch codex-app --model zendar79/qwen3:4b-q4km
OpenClaw
OpenClaw ollama launch openclaw --model zendar79/qwen3:4b-q4km
Hermes Agent
Hermes Agent ollama launch hermes --model zendar79/qwen3:4b-q4km
Codex
Codex ollama launch codex --model zendar79/qwen3:4b-q4km
OpenCode
OpenCode ollama launch opencode --model zendar79/qwen3:4b-q4km

Models

View all →

Readme

These are the steps I followed

Download from Hugging-Face CLI

pip install -U huggingface_hub
huggingface-cli download Qwen/Qwen3-4B-Instruct-2507 \
        --local-dir ./Qwen3-4B-Instruct-2507 \
        --exclude "*.git*" "README.md" ".gitattributes"

Produce a full-precision GGUF

python convert_hf_to_gguf.py ./Qwen3-4B-Instruct-2507 \
        --outfile ./qwen3-4b-f16.gguf \
        --outtype f16

Get the official llama.cpp repo

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
pip install -r requirements.txt

cmake -B build
cmake --build build --config Release

Or you can avoid this step and download the proper release to get the scripts

Do quantisation

./llama-quantize ./qwen3-4b-f16.gguf ./qwen3-4b-q4_k_m.gguf q4_k_m

./llama-quantize ./qwen3-4b-f16.gguf ./qwen3-4b-q4_0.gguf q4_0

Use with ollama

see this page to see the template format and how to import it on Ollama