180 yesterday

tools thinking
ollama run batiai/gemma4-e2b:q4

Applications

Claude Code
Claude Code ollama launch claude --model batiai/gemma4-e2b:q4
Codex
Codex ollama launch codex --model batiai/gemma4-e2b:q4
OpenCode
OpenCode ollama launch opencode --model batiai/gemma4-e2b:q4
OpenClaw
OpenClaw ollama launch openclaw --model batiai/gemma4-e2b:q4

Models

View all →

Readme

Gemma 4 E2B — Quantized by BatiAI

Quantized from official Google weights. Verified on real Mac hardware.

Models

Tag Size VRAM 16GB Mac mini M4 M4 Max (128GB) Use Case
q4 (latest) 3.2GB 7.1GB 107.8 t/s 132.5 t/s 16GB Mac recommended
q6 3.6GB 7.5GB 45.5 t/s ✅ 117.5 t/s Higher quality, fits 16GB

Quick Start

ollama run batiai/gemma4-e2b

Why Gemma 4 E2B?

  • 5.1B total params, 2.3B effective — PLE (Per-Layer Embeddings) for on-device efficiency
  • Vision support included (mmproj) — describe images in chat
  • Audio: supported in original model, not yet in llama.cpp/Ollama ecosystem
  • 128K context window
  • 3.2GB Q4 fits comfortably in 16GB Mac — plenty of room for KV cache
  • 107 t/s on Mac mini M4 — instant responses
  • Gemma license (free for most uses)

16GB Mac — The Lightest Option

Model Size VRAM 16GB Mac mini M4
batiai/gemma4-e2b:q4 3.2GB 7.1GB 107.8 t/s
batiai/gemma4-e4b:q4 5.0GB 10GB 57.1 t/s
batiai/qwen3.5-9b:q4 5.6GB 12.5 t/s

Gemma 4 E2B is the smallest and fastest model we ship — ideal for quick responses and low memory usage. For better tool calling accuracy, use E4B.

Why BatiAI?

  • Quantized directly from official Google weights (not third-party)
  • Q4_K_M and Q6_K — higher quality quant methods than default
  • Verified on Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB)
  • Korean language and tool calling tested on real hardware

Built for BatiFlow

Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.

https://flow.bati.ai