76 2 days ago

Stronger Gemma 4 E4B coding agent for Claude Code, 64K context, native tool-calling, runs fully local on 16GB Apple Silicon.

vision tools thinking audio
ollama run rafw007/gemma4-e4b-claude-coder

Applications

Claude Code
Claude Code ollama launch claude --model rafw007/gemma4-e4b-claude-coder
Codex App
Codex App ollama launch codex-app --model rafw007/gemma4-e4b-claude-coder
OpenClaw
OpenClaw ollama launch openclaw --model rafw007/gemma4-e4b-claude-coder
Hermes Agent
Hermes Agent ollama launch hermes --model rafw007/gemma4-e4b-claude-coder
Codex
Codex ollama launch codex --model rafw007/gemma4-e4b-claude-coder
OpenCode
OpenCode ollama launch opencode --model rafw007/gemma4-e4b-claude-coder

Models

View all →

Readme

Gemma 4 Claude Coder — local model family

A family of custom models built on Gemma 4 (edge variants E2B and E4B), tuned to act as autonomous coding and administration agents. The models speak the Anthropic-compatible API, so they drive Claude Code fully locally — your code never leaves your machine and cloud token cost drops to zero.

Each model ships with a system prompt focused on real work inside a codebase: use tools instead of guessing, make minimal and precise code changes, return complete and runnable output, and verify after acting. Sampling follows Google’s official Gemma 4 recommendation (temperature 1.0, top_k 64, top_p 0.95), with thinking mode enabled for better planning before a tool call.

Models in the family

Model Base Context Purpose
gemma4-e2b-claude-coder Gemma 4 E2B (eff. 2B / 5.1B with embeddings) 64K Fast everyday coding agent — edits, autocomplete, short agent loops. Lightest on memory.
gemma4-e4b-claude-coder Gemma 4 E4B (eff. 4B / 8B with embeddings) 64K Stronger coding agent — better reasoning and tool use on larger tasks.
gemma4-e4b-claude-coder-admin Gemma 4 E4B 32K Administration and system tasks (scripts, shell, devops). Smaller context fits 100% in GPU for higher, stable throughput.

What it’s for

  • Driving Claude Code locally (ollama launch claude --model <name>).
  • Agentic code writing and editing with native function calling / tool use.
  • Administration and devops tasks on a server (the admin variant).
  • Full privacy and offline operation — no code sent to the cloud.

Context

  • Coders (E2B / E4B): 64K tokens — matching Claude Code’s recommendation (64K minimum).
  • Admin (E4B): 32K tokens — a deliberate trade-off for 16 GB hardware that keeps the model entirely on the GPU.
  • Base Gemma 4 E2B/E4B natively supports up to 128K, so context can be raised on stronger hardware.

Test hardware

The models were built and tested on:

  • Mac Mini (Apple Silicon, M-series), 16 GB RAM, macOS 15.6
  • Ollama 0.24, GPU (Metal) inference

Measured performance (16 GB RAM)

Model Placement Speed Tool calling
gemma4-e2b-claude-coder 100% GPU ~55 tok/s ✅ valid JSON
gemma4-e4b-claude-coder (64K) 39% GPU / 61% CPU ~27 tok/s (drops under load)
gemma4-e4b-claude-coder-admin (32K) 100% GPU ~30 tok/s (stable)

All three passed an end-to-end test through Claude Code: real turns with tool calls and correct responses (HTTP 200 on /v1/messages).

How they were made

These models were designed, built and tested with the help of Claude Opus 4.8 — the best coding model in the world. Their system prompts, parameter choices and context configuration draw directly on its knowledge. In other words: the world’s best coding model prepared local models that take that work over right on your desk.

License

Apache 2.0 (inherited from the base Gemma 4).