47 Downloads Updated yesterday
ollama run rafw007/gemma4-26b-claude-coder
Updated yesterday
yesterday
2cc446f3396b · 21GB ·
Gemma 4 26B coding agent for Claude Code / Codex / opencode, 64K+ context, native tool-calling, Q5_K_M GGUF, runs fully local on 32GB Apple Silicon.
A custom model built on Gemma 4 26B ( ~25.8B params), tuned to act as an autonomous coding and administration agent. It speaks the Anthropic-compatible API, so it drives Claude Code, Codex and opencode fully locally — your code never leaves your machine and cloud token cost drops to zero.
This is the 25 GB-class big sibling of the Gemma 4 Claude Coder family (E2B / E4B). It ships on a Q5_K_M GGUF quantization, deliberately chosen over Q4_K_M: the smaller Q4_K_M build injected token corruption into long code generations (broken tags, glued digit-letter tokens), and Q5_K_M fixes it — long files come out clean. The system prompt focuses on real work inside a codebase: use tools instead of guessing, write files instead of pasting, ground every answer in real tool output (never fabricate results), stay in one language, and always finish the file you start. No-think mode is wired into the system prompt for fast, direct answers.
| Model | Base | Context | Purpose |
|---|---|---|---|
| gemma4-26b-claude-coder | Gemma 4 26B (~25.8B, Q5_K_M) | 64K (native 256K) | Strongest member — heavier reasoning and clean long-code generation on 32 GB hardware. |
| gemma4-e4b-claude-coder | Gemma 4 E4B (eff. 4B / 8B w/ embeddings) | 64K | Stronger 16 GB coder — reasoning and tool use on larger tasks. |
| gemma4-e2b-claude-coder | Gemma 4 E2B (eff. 2B / 5.1B w/ embeddings) | 64K | Fast everyday 16 GB coder — edits, autocomplete, short agent loops. |
ollama launch claude --model rafw007/gemma4-26b-claude-coder).nmap, df, du with no hallucinated output.message.tool_calls, and admin tasks (df/du,
full /24 nmap scans with host tables) report the actual output rather than inventing it.The model was built and tested on:
| Placement | Hardware | Speed | Tool calling |
|---|---|---|---|
| 100% GPU, native ctx, CONTEXT 65536 | Mac Studio M2 | ~52-56 tok/s | native, real message.tool_calls |
The model loads entirely on the GPU with no CPU spill (verified via ollama ps: 100% GPU,
CONTEXT 65536). The only real cost is a one-time cold load of the ~21 GB weights, not a per-turn
cost; warm generation runs ~52-56 tok/s on the Studio. The Mac Mini M4 (32 GB) is the same 32 GB
target class — bounded by memory bandwidth rather than the model.
The whole Gemma 4 family has thinking baked into the weights. The system prompt ships with
/nothink + an anti-reasoning instruction, which works on the direct API path and under
opencode/codex. Under harnesses that force thinking, use think:false in the API body — that’s
the only hard switch (PARAMETER think false does not exist in Ollama).
Q5_K_M removes the bulk corruption seen on the smaller Q4_K_M build — in testing, long single-pass
generations came out clean (zero .->- glitches, zero language drift). If you generate files for
production, a quick corruption scan before use is still good practice, but the Q5_K_M build tested
clean on the long-code task that previously failed.
Designed, built and tested with the help of Claude Opus 4.8 — the best coding model in the world. Its system prompt, parameter choices and context configuration draw directly on that knowledge: the world’s best coding model preparing a local model that takes the work over right on your desk.
| File | Quant | Size | Notes |
|---|---|---|---|
gemma4-26b-claude-coder-Q5_K_M.gguf |
Q5_K_M | ~21 GB | Recommended balance of quality/size; fits 32 GB with full 64K ctx. |
Both are derived from the same google/gemma-4-26B-A4B-it base and carry the identical Claude Coder
system prompt and parameters (see Modelfile).
Apache 2.0 (inherited from the base Gemma 4).