2,693 Downloads Updated 1 month ago
ollama run batiai/gemma4-e4b:q4
Quantized directly from official Google BF16 weights. Edge variant — slightly larger than E2B with the same multimodal kit (text + image + audio). Text-only here on Ollama; image + audio mmproj via HF + llama.cpp (see bottom).
| Tag | Size | VRAM | 16GB Mac mini M4 | M4 Max (128GB) | Use Case |
|---|---|---|---|---|---|
| q4 (latest) | 5.0GB | 10GB | 57.1 t/s ✅ | 84.0 t/s | 16GB Mac recommended |
| q6 | 5.8GB | 11GB | 45.0 t/s ✅ | 77.4 t/s | 16GB Mac, higher quality |
ollama run batiai/gemma4-e4b
| Model | Size | VRAM | 16GB Mac mini M4 | Tool Call |
|---|---|---|---|---|
| batiai/gemma4-e2b:q4 | 3.2GB | 7.1GB | 107.8 t/s | ⚠️ |
| batiai/gemma4-e4b:q4 | 5.0GB | 10GB | 57.1 t/s | ✅ |
| batiai/qwen3.5-9b:q4 | 5.6GB | — | 12.5 t/s | ✅ |
| gemma4:e4b (official) | 9.6GB | — | 27.7 t/s | ✅ |
BatiAI E4B Q4 is half the size of official gemma4:e4b, 2x faster, with tool calling support.
Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.
E4B is the larger Edge variant — same multimodal capabilities as E2B (text + image + audio) with more reasoning headroom.
wget https://huggingface.co/batiai/Gemma-4-E4B-it-GGUF/resolve/main/google-gemma-4-E4B-it-Q4_K_M.gguf
wget https://huggingface.co/batiai/Gemma-4-E4B-it-GGUF/resolve/main/mmproj-BF16.gguf
# Image input
llama-mtmd-cli -m google-gemma-4-E4B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
--image ~/Desktop/photo.jpg -p "describe this image"
# Audio input (transcribe / understand speech)
llama-mtmd-cli -m google-gemma-4-E4B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
--audio ~/Downloads/voice.wav -p "transcribe and summarize"
The mmproj holds both vision and audio encoders in a single 1411-tensor projector. Only BF16 mmproj available — combined vision+audio tensors don’t satisfy K-quant alignment, so Q6_K aborts (applies to every quantizer of this model).