519 Downloads Updated 3 days ago
ollama run rafw007/qwen36-a3b-claude-coder
ollama launch claude --model rafw007/qwen36-a3b-claude-coder
ollama launch codex-app --model rafw007/qwen36-a3b-claude-coder
ollama launch openclaw --model rafw007/qwen36-a3b-claude-coder
ollama launch hermes --model rafw007/qwen36-a3b-claude-coder
ollama launch codex --model rafw007/qwen36-a3b-claude-coder
ollama launch opencode --model rafw007/qwen36-a3b-claude-coder
A custom model built on Qwen3.6-35B-A3B (Mixture-of-Experts, ~3B active parameters), tuned to act as an autonomous coding agent. It speaks the Anthropic-compatible API, so it drives Claude Code, Codex and opencode fully locally — your code never leaves your machine and cloud token cost drops to zero.
Safety guardrails are intact. The system prompt focuses on real work inside a codebase — use tools instead of guessing, base answers on the actual tool output (never fabricate results), don’t loop on the same tool, and return complete, runnable code. No-think mode is wired into the system prompt for fast, direct answers.
| Model | Base | Context | Purpose |
|---|---|---|---|
| qwen36-a3b-claude-coder | Qwen3.6-35B-A3B (MoE, ~3B active) | 64K | Honest agentic coding agent — real tool-calling, no result hallucination, guardrails intact. |
ollama launch claude --model rafw007/qwen36-a3b-claude-coder).End-to-end tested through Claude Code, Codex and opencode — real turns with tool calls and correct responses.
Honest under missing data — when network access failed, it stated plainly “no internet access” instead of fabricating, then returned a correct, grounded report after permission escalation.
Tool-calling without hallucination — in a roundtrip test it called df -h /, received the real system output and reported the exact value, without re-calling the tool in a loop.
Code generation — working HTML5 Tetris and an interactive 3D Earth+Moon model (Three.js, real NASA textures, OrbitControls); JS passes syntax validation.
64K tokens — matching Claude Code’s recommendation (64K minimum). Base Qwen3.6 natively supports 262K, so context can be raised on stronger hardware.
| Placement | Speed (think:false) |
Tool calling |
|---|---|---|
| 100% GPU, 64K ctx | ~69 tok/s | ✅ native, real message.tool_calls |
On a Mac Mini M4 (32 GB) throughput is lower — a memory-bandwidth limit, not the model.
The whole Qwen3.6 family has thinking baked into the weights. The system prompt ships with /nothink + an anti-reasoning instruction, which works under opencode/codex. Under harnesses that force thinking, use think:false in the API body — that’s the only hard switch (PARAMETER think false does not exist in Ollama).
Designed, built and tested with the help of Claude Opus 4.8 — the best coding model in the world. Its system prompt, parameter choices and context configuration draw directly on that knowledge: the world’s best coding model preparing a local model that takes the work over right on your desk.
Apache 2.0 (inherited from the base Qwen3.6).