Gemma 4 E4B — Quantized by BatiAI
Quantized from official Google weights. Verified on real Mac hardware.
Models
| Tag |
Size |
VRAM |
16GB Mac mini M4 |
M4 Max (128GB) |
Use Case |
| q4 (latest) |
5.0GB |
10GB |
57.1 t/s ✅ |
84.0 t/s |
16GB Mac recommended |
| q6 |
5.8GB |
11GB |
45.0 t/s ✅ |
77.4 t/s |
16GB Mac, higher quality |
Quick Start
ollama run batiai/gemma4-e4b
Why Gemma 4 E4B?
- 8B total params, 4.5B effective — PLE (Per-Layer Embeddings) for on-device efficiency
- Vision support included (mmproj) — describe images in chat
- Audio: supported in original model, not yet in llama.cpp/Ollama ecosystem
- 128K context window
- Smarter than E2B — passes tool calling on 16GB Mac
- 57 t/s on Mac mini M4 — fast enough for real-time use
- Gemma license (free for most uses)
Model Comparison
| Model |
Size |
VRAM |
16GB Mac mini M4 |
Tool Call |
| batiai/gemma4-e2b:q4 |
3.2GB |
7.1GB |
107.8 t/s |
⚠️ |
| batiai/gemma4-e4b:q4 |
5.0GB |
10GB |
57.1 t/s |
✅ |
| batiai/qwen3.5-9b:q4 |
5.6GB |
— |
12.5 t/s |
✅ |
| gemma4:e4b (official) |
9.6GB |
— |
27.7 t/s |
✅ |
BatiAI E4B Q4 is half the size of official gemma4:e4b, 2x faster, with tool calling support.
16GB Mac Users
- Q4 recommended — 10GB VRAM, 57 t/s, best balance
- Q6 also works — 11GB VRAM, 45 t/s, higher quality, tested and verified
Why BatiAI?
- Quantized directly from official Google weights (not third-party)
- Q4_K_M and Q6_K — higher quality quant methods than default
- Verified on Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB)
- Korean language and tool calling tested on real hardware
Built for BatiFlow
Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.
https://flow.bati.ai