Gemma 4 E4B-it — Quantized by BatiAI

Quantized directly from official Google BF16 weights. Edge variant — slightly larger than E2B with the same multimodal kit (text + image + audio). Text-only here on Ollama; image + audio mmproj via HF + llama.cpp (see bottom).

Models

Tag	Size	VRAM	16GB Mac mini M4	M4 Max (128GB)	Use Case
q4 (latest)	5.0GB	10GB	57.1 t/s ✅	84.0 t/s	16GB Mac recommended
q6	5.8GB	11GB	45.0 t/s ✅	77.4 t/s	16GB Mac, higher quality

Quick Start

ollama run batiai/gemma4-e4b

Why Gemma 4 E4B?

8B total params, 4.5B effective — PLE (Per-Layer Embeddings) for on-device efficiency
Vision: mmproj available on HuggingFace (Ollama vision pending ecosystem fix)
128K context window
128K context window
Smarter than E2B — passes tool calling on 16GB Mac
57 t/s on Mac mini M4 — fast enough for real-time use
Gemma license (free for most uses)

Model Comparison

Model	Size	VRAM	16GB Mac mini M4	Tool Call
batiai/gemma4-e2b:q4	3.2GB	7.1GB	107.8 t/s	⚠️
batiai/gemma4-e4b:q4	5.0GB	10GB	57.1 t/s	✅
batiai/qwen3.5-9b:q4	5.6GB	—	12.5 t/s	✅
gemma4:e4b (official)	9.6GB	—	27.7 t/s	✅

BatiAI E4B Q4 is half the size of official gemma4:e4b, 2x faster, with tool calling support.

16GB Mac Users

Q4 recommended — 10GB VRAM, 57 t/s, best balance
Q6 also works — 11GB VRAM, 45 t/s, higher quality, tested and verified

Why BatiAI?

Quantized directly from official Google weights (not third-party)
Q4_K_M and Q6_K — higher quality quant methods than default
Verified on Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB)
Korean language and tool calling tested on real hardware

Built for BatiFlow

Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.

https://flow.bati.ai

Multimodal mode — image + audio (opt-in, HF + llama.cpp)

E4B is the larger Edge variant — same multimodal capabilities as E2B (text + image + audio) with more reasoning headroom.

wget https://huggingface.co/batiai/Gemma-4-E4B-it-GGUF/resolve/main/google-gemma-4-E4B-it-Q4_K_M.gguf
wget https://huggingface.co/batiai/Gemma-4-E4B-it-GGUF/resolve/main/mmproj-BF16.gguf

# Image input
llama-mtmd-cli -m google-gemma-4-E4B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
  --image ~/Desktop/photo.jpg -p "describe this image"

# Audio input (transcribe / understand speech)
llama-mtmd-cli -m google-gemma-4-E4B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
  --audio ~/Downloads/voice.wav -p "transcribe and summarize"

The mmproj holds both vision and audio encoders in a single 1411-tensor projector. Only BF16 mmproj available — combined vision+audio tensors don’t satisfy K-quant alignment, so Q6_K aborts (applies to every quantizer of this model).

Details

Readme