2,693 1 month ago

tools thinking
ollama run batiai/gemma4-e4b:q4

Applications

Claude Code
Claude Code ollama launch claude --model batiai/gemma4-e4b:q4
Codex App
Codex App ollama launch codex-app --model batiai/gemma4-e4b:q4
OpenClaw
OpenClaw ollama launch openclaw --model batiai/gemma4-e4b:q4
Hermes Agent
Hermes Agent ollama launch hermes --model batiai/gemma4-e4b:q4
Codex
Codex ollama launch codex --model batiai/gemma4-e4b:q4
OpenCode
OpenCode ollama launch opencode --model batiai/gemma4-e4b:q4

Models

View all →

Readme

Gemma 4 E4B-it — Quantized by BatiAI

Quantized directly from official Google BF16 weights. Edge variant — slightly larger than E2B with the same multimodal kit (text + image + audio). Text-only here on Ollama; image + audio mmproj via HF + llama.cpp (see bottom).

Models

Tag Size VRAM 16GB Mac mini M4 M4 Max (128GB) Use Case
q4 (latest) 5.0GB 10GB 57.1 t/s 84.0 t/s 16GB Mac recommended
q6 5.8GB 11GB 45.0 t/s ✅ 77.4 t/s 16GB Mac, higher quality

Quick Start

ollama run batiai/gemma4-e4b

Why Gemma 4 E4B?

  • 8B total params, 4.5B effective — PLE (Per-Layer Embeddings) for on-device efficiency
  • Vision: mmproj available on HuggingFace (Ollama vision pending ecosystem fix)
  • 128K context window
  • 128K context window
  • Smarter than E2B — passes tool calling on 16GB Mac
  • 57 t/s on Mac mini M4 — fast enough for real-time use
  • Gemma license (free for most uses)

Model Comparison

Model Size VRAM 16GB Mac mini M4 Tool Call
batiai/gemma4-e2b:q4 3.2GB 7.1GB 107.8 t/s ⚠️
batiai/gemma4-e4b:q4 5.0GB 10GB 57.1 t/s
batiai/qwen3.5-9b:q4 5.6GB 12.5 t/s
gemma4:e4b (official) 9.6GB 27.7 t/s

BatiAI E4B Q4 is half the size of official gemma4:e4b, 2x faster, with tool calling support.

16GB Mac Users

  • Q4 recommended — 10GB VRAM, 57 t/s, best balance
  • Q6 also works — 11GB VRAM, 45 t/s, higher quality, tested and verified

Why BatiAI?

  • Quantized directly from official Google weights (not third-party)
  • Q4_K_M and Q6_K — higher quality quant methods than default
  • Verified on Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB)
  • Korean language and tool calling tested on real hardware

Built for BatiFlow

Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.

https://flow.bati.ai

Multimodal mode — image + audio (opt-in, HF + llama.cpp)

E4B is the larger Edge variant — same multimodal capabilities as E2B (text + image + audio) with more reasoning headroom.

wget https://huggingface.co/batiai/Gemma-4-E4B-it-GGUF/resolve/main/google-gemma-4-E4B-it-Q4_K_M.gguf
wget https://huggingface.co/batiai/Gemma-4-E4B-it-GGUF/resolve/main/mmproj-BF16.gguf

# Image input
llama-mtmd-cli -m google-gemma-4-E4B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
  --image ~/Desktop/photo.jpg -p "describe this image"

# Audio input (transcribe / understand speech)
llama-mtmd-cli -m google-gemma-4-E4B-it-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
  --audio ~/Downloads/voice.wav -p "transcribe and summarize"

The mmproj holds both vision and audio encoders in a single 1411-tensor projector. Only BF16 mmproj available — combined vision+audio tensors don’t satisfy K-quant alignment, so Q6_K aborts (applies to every quantizer of this model).