2,770 Downloads Updated 1 month ago
ollama run batiai/gemma4-e2b:q6
Quantized directly from official Google BF16 weights. Edge variant — Google’s tiniest fully-multimodal Gemma 4 (text + image + audio). Text-only here on Ollama; image + audio mmproj via HF + llama.cpp (see bottom).
| Tag | Size | VRAM | 16GB Mac mini M4 | M4 Max (128GB) | Use Case |
|---|---|---|---|---|---|
| q4 (latest) | 3.2GB | 7.1GB | 107.8 t/s ✅ | 132.5 t/s | 16GB Mac recommended |
| q6 | 3.6GB | 7.5GB | 45.5 t/s ✅ | 117.5 t/s | Higher quality, fits 16GB |
ollama run batiai/gemma4-e2b
| Model | Size | VRAM | 16GB Mac mini M4 |
|---|---|---|---|
| batiai/gemma4-e2b:q4 | 3.2GB | 7.1GB | 107.8 t/s |
| batiai/gemma4-e4b:q4 | 5.0GB | 10GB | 57.1 t/s |
| batiai/qwen3.5-9b:q4 | 5.6GB | — | 12.5 t/s |
Gemma 4 E2B is the smallest and fastest model we ship — ideal for quick responses and low memory usage. For better tool calling accuracy, use E4B.
Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.
Unique to E series: audio support (not just vision). The mmproj on HF holds both vision and audio encoders together in a single 1411-tensor projector.
wget https://huggingface.co/batiai/Gemma-4-E2B-it-GGUF/resolve/main/google-gemma-4-E2B-it-Q4_K_M.gguf
wget https://huggingface.co/batiai/Gemma-4-E2B-it-GGUF/resolve/main/mmproj-BF16.gguf
# Image input
llama-mtmd-cli -m google-gemma-4-E2B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
--image ~/Desktop/photo.jpg -p "describe this image"
# Audio input (transcribe / understand speech)
llama-mtmd-cli -m google-gemma-4-E2B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
--audio ~/Downloads/voice.wav -p "transcribe and summarize"
Note: only BF16 mmproj is available for E series — the combined vision+audio projector tensors don’t satisfy K-quant block alignment, so Q6_K aborts. Applies to every quantizer.