Qwen3.5 Claude Coder — local coding agents A family of custom models built on **Qwen3.5**, tuned to act as **autonomous coding and administration agents**.

Details

Updated 1 month ago

1 month ago

a952b588bed5 · 3.4GB ·

model

archqwen35

parameters4.66B

quantizationQ4_K_M

3.4GB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

template

{{- if or .System .Tools }}<|im_start|>system {{ if .System }}{{ .System }} {{ end }}{{- if .Tools }

1.3kB

system

You are an autonomous coding and sysadmin agent in a real terminal (Claude Code) on macOS, with tool

1.5kB

params

{ "num_ctx": 65536, "presence_penalty": 1.5, "stop": [ "<|im_start|>", "

139B

Qwen3.5 Claude Coder — local coding agents

A family of custom models built on Qwen3.5, tuned to act as autonomous coding and administration agents. They speak the Anthropic-compatible API, so they drive Claude Code fully locally — your code never leaves your machine and cloud token cost drops to zero.

Each model ships with a system prompt focused on real work in a terminal: use tools instead of guessing, write files instead of pasting code, ground every answer in real tool output, and stay terse. Thinking is suppressed so the model acts immediately instead of monologuing. Context is set to 64K to match Claude Code’s recommended minimum.

Models in the family

Model	Base	Context	Purpose
qwen35-claude-coder:4b	Qwen3.5 4B (GGUF)	64K	Fast everyday coding agent. Lightest on memory, runs on 16GB Apple Silicon.
qwen35-claude-coder:9b	Qwen3.5 9B (GGUF)	64K	Stronger coding agent — production-quality code, better reasoning and tool use. Best on 32GB+.

MLX builds (*-mlx) are published on Hugging Face because the ollama.com registry currently does not accept MLX-format manifests.

What it’s for

Driving Claude Code locally (ollama launch claude --model <name>).
Agentic code writing and editing with native function calling / tool use.
Sysadmin / devops tasks in a real terminal (disk, network, scripts).
Full privacy and offline operation — no code sent to the cloud.

Quick start

ollama run rafw007/qwen35-claude-coder:9b

In Claude Code:

ollama launch claude --model rafw007/qwen35-claude-coder:9b

Behavior tuning (the hard-won part)

No thinking. The system prompt + sampling kill the monologue; the model runs a tool and answers instead of reasoning out loud.
No hallucination. It reports only values literally present in tool output — no invented hostnames, hardware, or numbers.
Acts, never asks. Inspect / scan / check / measure → it runs the command; running it is the answer.
Terse, one language. No preamble, no recap, matches the user’s language, never drifts into Chinese.
macOS-aware. Uses arp -a, nmap -sn, system_profiler rather than Linux-only commands.

Sampling / context

temperature 0.2, top_p 0.9, top_k 20, repeat_penalty 1.05, num_ctx 65536.
Qwen3.5 GGUF carries a 262K native context, so context can be raised further on stronger hardware.

Test hardware

Mac Mini (Apple Silicon, M4), 16GB and 32GB RAM, macOS
Ollama 0.24, GPU (Metal) inference

Measured behavior

Model	Placement	Verdict
qwen35-claude-coder:4b (GGUF)	16GB, fits	Fast, correct algorithms; rougher edge cases. Good light agent.
qwen35-claude-coder:9b (GGUF)	32GB, comfortable	Production-quality code, defensive, non-mutating. Zero emoji, zero hallucination in disk/network agent tests.

Both pass end-to-end through Claude Code: real turns with tool calls and correct responses.

How they were made

These models were designed, built and tested with the help of Claude Opus — the idea being that the best coding model in the world should be able to create smaller models in its own image. Their system prompts, parameters and context configuration come straight from that work: the world’s best coding model preparing local models that take over right on your desk.

License

Apache 2.0 (inherited from the base Qwen3.5).