186 yesterday

Qwen3.5 Claude Coder — local coding agents A family of custom models built on **Qwen3.5**, tuned to act as **autonomous coding and administration agents**.

vision tools thinking 4b 9b
ollama run rafw007/qwen35-claude-coder:4b

Details

yesterday

a952b588bed5 · 3.4GB ·

qwen35
·
4.66B
·
Q4_K_M
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{{- if or .System .Tools }}<|im_start|>system {{ if .System }}{{ .System }} {{ end }}{{- if .Tools }
You are an autonomous coding and sysadmin agent in a real terminal (Claude Code) on macOS, with tool
{ "num_ctx": 65536, "presence_penalty": 1.5, "stop": [ "<|im_start|>", "

Readme


Qwen3.5 Claude Coder — local coding agents

A family of custom models built on Qwen3.5, tuned to act as autonomous coding and administration agents. They speak the Anthropic-compatible API, so they drive Claude Code fully locally — your code never leaves your machine and cloud token cost drops to zero.

Each model ships with a system prompt focused on real work in a terminal: use tools instead of guessing, write files instead of pasting code, ground every answer in real tool output, and stay terse. Thinking is suppressed so the model acts immediately instead of monologuing. Context is set to 64K to match Claude Code’s recommended minimum.

Models in the family

Model Base Context Purpose
qwen35-claude-coder:4b Qwen3.5 4B (GGUF) 64K Fast everyday coding agent. Lightest on memory, runs on 16GB Apple Silicon.
qwen35-claude-coder:9b Qwen3.5 9B (GGUF) 64K Stronger coding agent — production-quality code, better reasoning and tool use. Best on 32GB+.

MLX builds (*-mlx) are published on Hugging Face because the ollama.com registry currently does not accept MLX-format manifests.

What it’s for

  • Driving Claude Code locally (ollama launch claude --model <name>).
  • Agentic code writing and editing with native function calling / tool use.
  • Sysadmin / devops tasks in a real terminal (disk, network, scripts).
  • Full privacy and offline operation — no code sent to the cloud.

Quick start

ollama run rafw007/qwen35-claude-coder:9b

In Claude Code:

ollama launch claude --model rafw007/qwen35-claude-coder:9b

Behavior tuning (the hard-won part)

  • No thinking. The system prompt + sampling kill the monologue; the model runs a tool and answers instead of reasoning out loud.
  • No hallucination. It reports only values literally present in tool output — no invented hostnames, hardware, or numbers.
  • Acts, never asks. Inspect / scan / check / measure → it runs the command; running it is the answer.
  • Terse, one language. No preamble, no recap, matches the user’s language, never drifts into Chinese.
  • macOS-aware. Uses arp -a, nmap -sn, system_profiler rather than Linux-only commands.

Sampling / context

  • temperature 0.2, top_p 0.9, top_k 20, repeat_penalty 1.05, num_ctx 65536.
  • Qwen3.5 GGUF carries a 262K native context, so context can be raised further on stronger hardware.

Test hardware

  • Mac Mini (Apple Silicon, M4), 16GB and 32GB RAM, macOS
  • Ollama 0.24, GPU (Metal) inference

Measured behavior

Model Placement Verdict
qwen35-claude-coder:4b (GGUF) 16GB, fits Fast, correct algorithms; rougher edge cases. Good light agent.
qwen35-claude-coder:9b (GGUF) 32GB, comfortable Production-quality code, defensive, non-mutating. Zero emoji, zero hallucination in disk/network agent tests.

Both pass end-to-end through Claude Code: real turns with tool calls and correct responses.

How they were made

These models were designed, built and tested with the help of Claude Opus — the idea being that the best coding model in the world should be able to create smaller models in its own image. Their system prompts, parameters and context configuration come straight from that work: the world’s best coding model preparing local models that take over right on your desk.

License

Apache 2.0 (inherited from the base Qwen3.5).