521 3 days ago

Qwen3.6-35B-A3B MoE coding agent for Claude Code / Codex / opencode, 64K context, native tool-calling, honest tool use, safety guardrails intact

vision tools thinking
ollama run rafw007/qwen36-a3b-claude-coder:q4_K_M

Details

4 days ago

36853d5c1fed · 24GB ·

qwen35moe
·
36B
·
Q4_K_M
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{{ .Prompt }}
/nothink Jestes Qwen3.6-35B-A3B Claude Coder - agentowy asystent kodowania pod Claude Code, Codex i
{ "min_p": 0, "num_ctx": 65536, "presence_penalty": 0, "repeat_penalty": 1, "tem

Readme

Qwen3.6 Claude Coder — local MoE coding agent

A custom model built on Qwen3.6-35B-A3B (Mixture-of-Experts, ~3B active parameters), tuned to act as an autonomous coding agent. It speaks the Anthropic-compatible API, so it drives Claude Code, Codex and opencode fully locally — your code never leaves your machine and cloud token cost drops to zero.

Safety guardrails are intact. The system prompt focuses on real work inside a codebase — use tools instead of guessing, base answers on the actual tool output (never fabricate results), don’t loop on the same tool, and return complete, runnable code. No-think mode is wired into the system prompt for fast, direct answers.

Models in the family

Model Base Context Purpose
qwen36-a3b-claude-coder Qwen3.6-35B-A3B (MoE, ~3B active) 64K Honest agentic coding agent — real tool-calling, no result hallucination, guardrails intact.

What it’s for

  • Driving Claude Code / Codex / opencode locally (ollama launch claude --model rafw007/qwen36-a3b-claude-coder).
  • Agentic code writing and editing with native function calling / tool use.
  • Full privacy and offline operation — no code sent to the cloud.

Tested harnesses

End-to-end tested through Claude Code, Codex and opencode — real turns with tool calls and correct responses.

Measured behavior (June 2026 tests)

  • Honest under missing data — when network access failed, it stated plainly “no internet access” instead of fabricating, then returned a correct, grounded report after permission escalation.

  • Tool-calling without hallucination — in a roundtrip test it called df -h /, received the real system output and reported the exact value, without re-calling the tool in a loop.

  • Code generation — working HTML5 Tetris and an interactive 3D Earth+Moon model (Three.js, real NASA textures, OrbitControls); JS passes syntax validation.

    Context

  • 64K tokens — matching Claude Code’s recommendation (64K minimum). Base Qwen3.6 natively supports 262K, so context can be raised on stronger hardware.

Test hardware

  • Mac Studio M2 (Apple Silicon), macOSOllama 0.30, GPU (Metal) inference
  • Quantization: nvfp4 (~21 GB weights)

Measured performance

Placement Speed (think:false) Tool calling
100% GPU, 64K ctx ~69 tok/s ✅ native, real message.tool_calls

On a Mac Mini M4 (32 GB) throughput is lower — a memory-bandwidth limit, not the model.

No-think mode

The whole Qwen3.6 family has thinking baked into the weights. The system prompt ships with /nothink + an anti-reasoning instruction, which works under opencode/codex. Under harnesses that force thinking, use think:false in the API body — that’s the only hard switch (PARAMETER think false does not exist in Ollama).

How it was made

Designed, built and tested with the help of Claude Opus 4.8 — the best coding model in the world. Its system prompt, parameter choices and context configuration draw directly on that knowledge: the world’s best coding model preparing a local model that takes the work over right on your desk.

License

Apache 2.0 (inherited from the base Qwen3.6).