Qwen3.6-35B-A3B MoE coding agent for Claude Code / Codex / opencode, 64K context, native tool-calling, honest tool use, safety guardrails intact

Applications

Claude Code ollama launch claude --model rafw007/qwen36-a3b-claude-coder

Codex App ollama launch codex-app --model rafw007/qwen36-a3b-claude-coder

OpenClaw ollama launch openclaw --model rafw007/qwen36-a3b-claude-coder

Hermes Agent ollama launch hermes --model rafw007/qwen36-a3b-claude-coder

Codex ollama launch codex --model rafw007/qwen36-a3b-claude-coder

OpenCode ollama launch opencode --model rafw007/qwen36-a3b-claude-coder

Qwen3.6 Claude Coder — local MoE coding agent

A custom model built on Qwen3.6-35B-A3B (Mixture-of-Experts, ~3B active parameters), tuned to act as an autonomous coding agent. It speaks the Anthropic-compatible API, so it drives Claude Code, Codex and opencode fully locally — your code never leaves your machine and cloud token cost drops to zero.

Safety guardrails are intact. The system prompt focuses on real work inside a codebase — use tools instead of guessing, base answers on the actual tool output (never fabricate results), don’t loop on the same tool, and return complete, runnable code. No-think mode is wired into the system prompt for fast, direct answers.

Models in the family

Model	Base	Context	Purpose
qwen36-a3b-claude-coder	Qwen3.6-35B-A3B (MoE, ~3B active)	64K	Honest agentic coding agent — real tool-calling, no result hallucination, guardrails intact.

What it’s for

Driving Claude Code / Codex / opencode locally (ollama launch claude --model rafw007/qwen36-a3b-claude-coder).
Agentic code writing and editing with native function calling / tool use.
Full privacy and offline operation — no code sent to the cloud.

Tested harnesses

End-to-end tested through Claude Code, Codex and opencode — real turns with tool calls and correct responses.

Measured behavior (June 2026 tests)

Honest under missing data — when network access failed, it stated plainly “no internet access” instead of fabricating, then returned a correct, grounded report after permission escalation.
Tool-calling without hallucination — in a roundtrip test it called df -h /, received the real system output and reported the exact value, without re-calling the tool in a loop.
Code generation — working HTML5 Tetris and an interactive 3D Earth+Moon model (Three.js, real NASA textures, OrbitControls); JS passes syntax validation.

Context
64K tokens — matching Claude Code’s recommendation (64K minimum). Base Qwen3.6 natively supports 262K, so context can be raised on stronger hardware.

Test hardware

Mac Studio M2 (Apple Silicon), macOS — Ollama 0.30, GPU (Metal) inference
Quantization: nvfp4 (~21 GB weights)

Measured performance

Placement	Speed (`think:false`)	Tool calling
100% GPU, 64K ctx	~69 tok/s	✅ native, real `message.tool_calls`

On a Mac Mini M4 (32 GB) throughput is lower — a memory-bandwidth limit, not the model.

No-think mode

The whole Qwen3.6 family has thinking baked into the weights. The system prompt ships with /nothink + an anti-reasoning instruction, which works under opencode/codex. Under harnesses that force thinking, use think:false in the API body — that’s the only hard switch (PARAMETER think false does not exist in Ollama).

How it was made

Designed, built and tested with the help of Claude Opus 4.8 — the best coding model in the world. Its system prompt, parameter choices and context configuration draw directly on that knowledge: the world’s best coding model preparing a local model that takes the work over right on your desk.

License

Apache 2.0 (inherited from the base Qwen3.6).