136 yesterday

tools thinking
ollama run batiai/gemma4-e4b:q4

Details

yesterday

d682bf87e3a3 · 5.3GB ·

gemma4
·
7.52B
·
Q4_K_M
You are a helpful AI assistant.
{ "num_ctx": 131072, "stop": [ "<turn|>" ], "temperature": 0.7 }

Readme

Gemma 4 E4B — Quantized by BatiAI

Quantized from official Google weights. Verified on real Mac hardware.

Models

Tag Size VRAM 16GB Mac mini M4 M4 Max (128GB) Use Case
q4 (latest) 5.0GB 10GB 57.1 t/s 84.0 t/s 16GB Mac recommended
q6 5.8GB 11GB 45.0 t/s ✅ 77.4 t/s 16GB Mac, higher quality

Quick Start

ollama run batiai/gemma4-e4b

Why Gemma 4 E4B?

  • 8B total params, 4.5B effective — PLE (Per-Layer Embeddings) for on-device efficiency
  • Vision support included (mmproj) — describe images in chat
  • Audio: supported in original model, not yet in llama.cpp/Ollama ecosystem
  • 128K context window
  • Smarter than E2B — passes tool calling on 16GB Mac
  • 57 t/s on Mac mini M4 — fast enough for real-time use
  • Gemma license (free for most uses)

Model Comparison

Model Size VRAM 16GB Mac mini M4 Tool Call
batiai/gemma4-e2b:q4 3.2GB 7.1GB 107.8 t/s ⚠️
batiai/gemma4-e4b:q4 5.0GB 10GB 57.1 t/s
batiai/qwen3.5-9b:q4 5.6GB 12.5 t/s
gemma4:e4b (official) 9.6GB 27.7 t/s

BatiAI E4B Q4 is half the size of official gemma4:e4b, 2x faster, with tool calling support.

16GB Mac Users

  • Q4 recommended — 10GB VRAM, 57 t/s, best balance
  • Q6 also works — 11GB VRAM, 45 t/s, higher quality, tested and verified

Why BatiAI?

  • Quantized directly from official Google weights (not third-party)
  • Q4_K_M and Q6_K — higher quality quant methods than default
  • Verified on Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB)
  • Korean language and tool calling tested on real hardware

Built for BatiFlow

Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.

https://flow.bati.ai