Applications

Claude Code ollama launch claude --model batiai/gemma4-e2b:q4

Codex App ollama launch codex-app --model batiai/gemma4-e2b:q4

OpenClaw ollama launch openclaw --model batiai/gemma4-e2b:q4

Hermes Agent ollama launch hermes --model batiai/gemma4-e2b:q4

Codex ollama launch codex --model batiai/gemma4-e2b:q4

OpenCode ollama launch opencode --model batiai/gemma4-e2b:q4

Gemma 4 E2B-it — Quantized by BatiAI

Quantized directly from official Google BF16 weights. Edge variant — Google’s tiniest fully-multimodal Gemma 4 (text + image + audio). Text-only here on Ollama; image + audio mmproj via HF + llama.cpp (see bottom).

Models

Tag	Size	VRAM	16GB Mac mini M4	M4 Max (128GB)	Use Case
q4 (latest)	3.2GB	7.1GB	107.8 t/s ✅	132.5 t/s	16GB Mac recommended
q6	3.6GB	7.5GB	45.5 t/s ✅	117.5 t/s	Higher quality, fits 16GB

Quick Start

ollama run batiai/gemma4-e2b

Why Gemma 4 E2B?

5.1B total params, 2.3B effective — PLE (Per-Layer Embeddings) for on-device efficiency
Vision: mmproj available on HuggingFace (Ollama vision pending ecosystem fix)
128K context window
128K context window
3.2GB Q4 fits comfortably in 16GB Mac — plenty of room for KV cache
107 t/s on Mac mini M4 — instant responses
Gemma license (free for most uses)

16GB Mac — The Lightest Option

Model	Size	VRAM	16GB Mac mini M4
batiai/gemma4-e2b:q4	3.2GB	7.1GB	107.8 t/s
batiai/gemma4-e4b:q4	5.0GB	10GB	57.1 t/s
batiai/qwen3.5-9b:q4	5.6GB	—	12.5 t/s

Gemma 4 E2B is the smallest and fastest model we ship — ideal for quick responses and low memory usage. For better tool calling accuracy, use E4B.

Why BatiAI?

Quantized directly from official Google weights (not third-party)
Q4_K_M and Q6_K — higher quality quant methods than default
Verified on Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB)
Korean language and tool calling tested on real hardware

Built for BatiFlow

Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.

https://flow.bati.ai

Multimodal mode — image + audio (opt-in, HF + llama.cpp)

Unique to E series: audio support (not just vision). The mmproj on HF holds both vision and audio encoders together in a single 1411-tensor projector.

wget https://huggingface.co/batiai/Gemma-4-E2B-it-GGUF/resolve/main/google-gemma-4-E2B-it-Q4_K_M.gguf
wget https://huggingface.co/batiai/Gemma-4-E2B-it-GGUF/resolve/main/mmproj-BF16.gguf

# Image input
llama-mtmd-cli -m google-gemma-4-E2B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
  --image ~/Desktop/photo.jpg -p "describe this image"

# Audio input (transcribe / understand speech)
llama-mtmd-cli -m google-gemma-4-E2B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
  --audio ~/Downloads/voice.wav -p "transcribe and summarize"

Note: only BF16 mmproj is available for E series — the combined vision+audio projector tensors don’t satisfy K-quant block alignment, so Q6_K aborts. Applies to every quantizer.

Applications

Models

Readme

Gemma 4 E2B-it — Quantized by BatiAI

Models

Quick Start

Why Gemma 4 E2B?

16GB Mac — The Lightest Option

Why BatiAI?

Built for BatiFlow

Multimodal mode — image + audio (opt-in, HF + llama.cpp)