2 5 days ago

qwen3-4b-reasoning is a 4B-parameter Qwen3-based reasoning “backfill” fine-tune (joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1) converted to GGUF for llama.cpp/Ollama, with ~40K context and published as Q4_K_M (recommended) and iq4_xs (smaller).

Models

View all →

Readme

Qwen3-4B-Reasoning: GGUF quantizations for Ollama

Overview

Qwen3-4B-Reasoning is a GGUF conversion of joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1 for llama.cpp / Ollama. Upstream: https://huggingface.co/joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1

Notes

Alias matches existing local artifacts; adjust if needed.

Key Details

  • Prompt format: ChatML
  • Architecture: qwen3
  • Size label: 4.0B
  • Context length: 40960
  • License (from GGUF metadata): apache-2.0
  • Base model: Qwen — Qwen3 4B — https://huggingface.co/Qwen/Qwen3-4B
  • Suggested sampling (from GGUF metadata): top_k=20, top_p=0.949999988079071, temp=0.6000000238418579

Status

  • Local GGUFs: present

Available Versions

Tag GGUF Size RAM (est.) Notes
IQ4_XS Qwen3-4B-Reasoning-IQ4_XS.gguf 2.13 GiB 4 GiB
Q4_K_M Qwen3-4B-Reasoning-Q4_K_M.gguf 2.33 GiB 4 GiB Recommended

Quick Start

ollama run richardyoung/qwen3-4b-reasoning:q4_k_m "Hello!"

Available Commands

  • ollama run richardyoung/qwen3-4b-reasoning:iq4_xs
  • ollama run richardyoung/qwen3-4b-reasoning:q4_k_m

License

See the upstream repo for license/terms: https://huggingface.co/joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1

Acknowledgments

  • Quantized with llama.cpp (llama-quantize).
  • GGUF conversion via llama.cpp (convert_hf_to_gguf.py).