richardyoung/ qwen3-4b-reasoning

28 Downloads Updated 1 month ago

qwen3-4b-reasoning is a 4B-parameter Qwen3-based reasoning “backfill” fine-tune (joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1) converted to GGUF for llama.cpp/Ollama, with ~40K context and published as Q4_K_M (recommended) and iq4_xs (smaller).

ollama run richardyoung/qwen3-4b-reasoning:Q4_K_M

curl http://localhost:11434/api/chat \
  -d '{
    "model": "richardyoung/qwen3-4b-reasoning:Q4_K_M",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='richardyoung/qwen3-4b-reasoning:Q4_K_M',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'richardyoung/qwen3-4b-reasoning:Q4_K_M',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

Name

2 models

Size

Context

Input

qwen3-4b-reasoning:Q4_K_M

2.5GB · 40K context window · Text · 1 month ago

qwen3-4b-reasoning:Q4_K_M

2.5GB

40K

Text

qwen3-4b-reasoning:iq4_xs

2.3GB · 40K context window · Text · 1 month ago

qwen3-4b-reasoning:iq4_xs

2.3GB

40K

Text

Readme

Qwen3-4B-Reasoning: GGUF quantizations for Ollama

Overview

Qwen3-4B-Reasoning is a GGUF conversion of joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1 for llama.cpp / Ollama. Upstream: https://huggingface.co/joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1

Notes

Alias matches existing local artifacts; adjust if needed.

Key Details

Prompt format: ChatML
Architecture: qwen3
Size label: 4.0B
Context length: 40960
License (from GGUF metadata): apache-2.0
Base model: Qwen — Qwen3 4B — https://huggingface.co/Qwen/Qwen3-4B
Suggested sampling (from GGUF metadata): top_k=20, top_p=0.949999988079071, temp=0.6000000238418579

Status

Local GGUFs: present

Available Versions

Tag	GGUF	Size	RAM (est.)	Notes
`IQ4_XS`	`Qwen3-4B-Reasoning-IQ4_XS.gguf`	2.13 GiB	4 GiB
`Q4_K_M`	`Qwen3-4B-Reasoning-Q4_K_M.gguf`	2.33 GiB	4 GiB	Recommended

Quick Start

ollama run richardyoung/qwen3-4b-reasoning:q4_k_m "Hello!"

Available Commands

ollama run richardyoung/qwen3-4b-reasoning:iq4_xs
ollama run richardyoung/qwen3-4b-reasoning:q4_k_m

License

See the upstream repo for license/terms: https://huggingface.co/joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1

Acknowledgments

Quantized with llama.cpp (llama-quantize).
GGUF conversion via llama.cpp (convert_hf_to_gguf.py).

# Qwen3-4B-Reasoning: GGUF quantizations for Ollama

## Overview
`Qwen3-4B-Reasoning` is a GGUF conversion of `joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1` for llama.cpp / Ollama.
Upstream: https://huggingface.co/joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1

## Notes
Alias matches existing local artifacts; adjust if needed.

## Key Details
- Prompt format: `ChatML`
- Architecture: `qwen3`
- Size label: `4.0B`
- Context length: `40960`
- License (from GGUF metadata): `apache-2.0`
- Base model: Qwen — Qwen3 4B — https://huggingface.co/Qwen/Qwen3-4B
- Suggested sampling (from GGUF metadata): `top_k=20, top_p=0.949999988079071, temp=0.6000000238418579`

## Status
- Local GGUFs: present

## Available Versions

| Tag | GGUF | Size | RAM (est.) | Notes |
|---|---|---:|---:|---|
| `IQ4_XS` | `Qwen3-4B-Reasoning-IQ4_XS.gguf` | 2.13 GiB | 4 GiB |  |
| `Q4_K_M` | `Qwen3-4B-Reasoning-Q4_K_M.gguf` | 2.33 GiB | 4 GiB | Recommended |

## Quick Start

```bash
ollama run richardyoung/qwen3-4b-reasoning:q4_k_m "Hello!"
```

## Available Commands
- `ollama run richardyoung/qwen3-4b-reasoning:iq4_xs`
- `ollama run richardyoung/qwen3-4b-reasoning:q4_k_m`

## License
See the upstream repo for license/terms: https://huggingface.co/joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1

## Acknowledgments
- Quantized with llama.cpp (`llama-quantize`).
- GGUF conversion via llama.cpp (`convert_hf_to_gguf.py`).

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)