I've started from Qwen/Qwen3-4B-Instruct-2507 fp16 and quantised it

tools

ollama run zendar79/qwen3:4b-q4km

curl http://localhost:11434/api/chat \
  -d '{
    "model": "zendar79/qwen3:4b-q4km",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='zendar79/qwen3:4b-q4km',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'zendar79/qwen3:4b-q4km',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Applications

Claude Code ollama launch claude --model zendar79/qwen3:4b-q4km

Codex App ollama launch codex-app --model zendar79/qwen3:4b-q4km

OpenClaw ollama launch openclaw --model zendar79/qwen3:4b-q4km

Hermes Agent ollama launch hermes --model zendar79/qwen3:4b-q4km

Codex ollama launch codex --model zendar79/qwen3:4b-q4km

OpenCode ollama launch opencode --model zendar79/qwen3:4b-q4km

Models

View all →

Name

2 models

Size / Usage

Context

Input

qwen3:4b-q4km

2.5GB · 256K context window · Text · 9 months ago

qwen3:4b-q4km

2.5GB

256K

Text

qwen3:4b-q4_0

2.4GB · 256K context window · Text · 9 months ago

qwen3:4b-q4_0

2.4GB

256K

Text

Readme

These are the steps I followed

Download from Hugging-Face CLI

pip install -U huggingface_hub
huggingface-cli download Qwen/Qwen3-4B-Instruct-2507 \
        --local-dir ./Qwen3-4B-Instruct-2507 \
        --exclude "*.git*" "README.md" ".gitattributes"

Produce a full-precision GGUF

python convert_hf_to_gguf.py ./Qwen3-4B-Instruct-2507 \
        --outfile ./qwen3-4b-f16.gguf \
        --outtype f16

Get the official llama.cpp repo

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
pip install -r requirements.txt

cmake -B build
cmake --build build --config Release

Or you can avoid this step and download the proper release to get the scripts

Do quantisation

./llama-quantize ./qwen3-4b-f16.gguf ./qwen3-4b-q4_k_m.gguf q4_k_m

./llama-quantize ./qwen3-4b-f16.gguf ./qwen3-4b-q4_0.gguf q4_0

Use with ollama

see this page to see the template format and how to import it on Ollama