Qwen3.6-35B-A3B MoE coding agent for Claude Code / Codex / opencode, 64K context, native tool-calling, honest tool use, safety guardrails intact

Details

Updated 1 month ago

1 month ago

36853d5c1fed · 24GB ·

model

archqwen35moe

parameters36B

quantizationQ4_K_M

24GB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

system

/nothink Jestes Qwen3.6-35B-A3B Claude Coder - agentowy asystent kodowania pod Claude Code, Codex i

461B

params

{ "min_p": 0, "num_ctx": 65536, "presence_penalty": 0, "repeat_penalty": 1, "tem

109B

template

13B

Qwen3.6 Claude Coder — local MoE coding agent

A custom model built on Qwen3.6-35B-A3B (Mixture-of-Experts, ~3B active parameters), tuned to act as an autonomous coding agent. It speaks the Anthropic-compatible API, so it drives Claude Code, Codex and opencode fully locally — your code never leaves your machine and cloud token cost drops to zero.

Safety guardrails are intact. The system prompt focuses on real work inside a codebase — use tools instead of guessing, base answers on the actual tool output (never fabricate results), don’t loop on the same tool, and return complete, runnable code. No-think mode is wired into the system prompt for fast, direct answers.

Models in the family

Model	Base	Context	Purpose
qwen36-a3b-claude-coder	Qwen3.6-35B-A3B (MoE, ~3B active)	64K	Honest agentic coding agent — real tool-calling, no result hallucination, guardrails intact.

What it’s for

Driving Claude Code / Codex / opencode locally (ollama launch claude --model rafw007/qwen36-a3b-claude-coder).
Agentic code writing and editing with native function calling / tool use.
Full privacy and offline operation — no code sent to the cloud.

Tested harnesses

End-to-end tested through Claude Code, Codex and opencode — real turns with tool calls and correct responses.

Measured behavior (June 2026 tests)

Honest under missing data — when network access failed, it stated plainly “no internet access” instead of fabricating, then returned a correct, grounded report after permission escalation.
Tool-calling without hallucination — in a roundtrip test it called df -h /, received the real system output and reported the exact value, without re-calling the tool in a loop.
Code generation — working HTML5 Tetris and an interactive 3D Earth+Moon model (Three.js, real NASA textures, OrbitControls); JS passes syntax validation.

Context
64K tokens — matching Claude Code’s recommendation (64K minimum). Base Qwen3.6 natively supports 262K, so context can be raised on stronger hardware.

Test hardware

Mac Studio M2 (Apple Silicon), macOS — Ollama 0.30, GPU (Metal) inference
Quantization: nvfp4 (~21 GB weights)

Measured performance

Placement	Speed (`think:false`)	Tool calling
100% GPU, 64K ctx	~69 tok/s	✅ native, real `message.tool_calls`

On a Mac Mini M4 (32 GB) throughput is lower — a memory-bandwidth limit, not the model.

No-think mode

The whole Qwen3.6 family has thinking baked into the weights. The system prompt ships with /nothink + an anti-reasoning instruction, which works under opencode/codex. Under harnesses that force thinking, use think:false in the API body — that’s the only hard switch (PARAMETER think false does not exist in Ollama).

How it was made

Designed, built and tested with the help of Claude Opus 4.8 — the best coding model in the world. Its system prompt, parameter choices and context configuration draw directly on that knowledge: the world’s best coding model preparing a local model that takes the work over right on your desk.

License

Apache 2.0 (inherited from the base Qwen3.6).