2,770 1 month ago

tools thinking
ollama run batiai/gemma4-e2b:q4

Applications

Claude Code
Claude Code ollama launch claude --model batiai/gemma4-e2b:q4
Codex App
Codex App ollama launch codex-app --model batiai/gemma4-e2b:q4
OpenClaw
OpenClaw ollama launch openclaw --model batiai/gemma4-e2b:q4
Hermes Agent
Hermes Agent ollama launch hermes --model batiai/gemma4-e2b:q4
Codex
Codex ollama launch codex --model batiai/gemma4-e2b:q4
OpenCode
OpenCode ollama launch opencode --model batiai/gemma4-e2b:q4

Models

View all →

Readme

Gemma 4 E2B-it — Quantized by BatiAI

Quantized directly from official Google BF16 weights. Edge variant — Google’s tiniest fully-multimodal Gemma 4 (text + image + audio). Text-only here on Ollama; image + audio mmproj via HF + llama.cpp (see bottom).

Models

Tag Size VRAM 16GB Mac mini M4 M4 Max (128GB) Use Case
q4 (latest) 3.2GB 7.1GB 107.8 t/s 132.5 t/s 16GB Mac recommended
q6 3.6GB 7.5GB 45.5 t/s ✅ 117.5 t/s Higher quality, fits 16GB

Quick Start

ollama run batiai/gemma4-e2b

Why Gemma 4 E2B?

  • 5.1B total params, 2.3B effective — PLE (Per-Layer Embeddings) for on-device efficiency
  • Vision: mmproj available on HuggingFace (Ollama vision pending ecosystem fix)
  • 128K context window
  • 128K context window
  • 3.2GB Q4 fits comfortably in 16GB Mac — plenty of room for KV cache
  • 107 t/s on Mac mini M4 — instant responses
  • Gemma license (free for most uses)

16GB Mac — The Lightest Option

Model Size VRAM 16GB Mac mini M4
batiai/gemma4-e2b:q4 3.2GB 7.1GB 107.8 t/s
batiai/gemma4-e4b:q4 5.0GB 10GB 57.1 t/s
batiai/qwen3.5-9b:q4 5.6GB 12.5 t/s

Gemma 4 E2B is the smallest and fastest model we ship — ideal for quick responses and low memory usage. For better tool calling accuracy, use E4B.

Why BatiAI?

  • Quantized directly from official Google weights (not third-party)
  • Q4_K_M and Q6_K — higher quality quant methods than default
  • Verified on Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB)
  • Korean language and tool calling tested on real hardware

Built for BatiFlow

Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.

https://flow.bati.ai

Multimodal mode — image + audio (opt-in, HF + llama.cpp)

Unique to E series: audio support (not just vision). The mmproj on HF holds both vision and audio encoders together in a single 1411-tensor projector.

wget https://huggingface.co/batiai/Gemma-4-E2B-it-GGUF/resolve/main/google-gemma-4-E2B-it-Q4_K_M.gguf
wget https://huggingface.co/batiai/Gemma-4-E2B-it-GGUF/resolve/main/mmproj-BF16.gguf

# Image input
llama-mtmd-cli -m google-gemma-4-E2B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
  --image ~/Desktop/photo.jpg -p "describe this image"

# Audio input (transcribe / understand speech)
llama-mtmd-cli -m google-gemma-4-E2B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
  --audio ~/Downloads/voice.wav -p "transcribe and summarize"

Note: only BF16 mmproj is available for E series — the combined vision+audio projector tensors don’t satisfy K-quant block alignment, so Q6_K aborts. Applies to every quantizer.