185 yesterday

tools thinking
ollama run batiai/gemma4-31b:iq3

Details

yesterday

862b9a92c983 · 14GB ·

gemma4
·
30.7B
·
(!unknown_file_type 27!)
You are a helpful AI assistant.
{ "num_ctx": 131072, "stop": [ "<turn|>" ], "temperature": 0.7 }

Readme

Gemma 4 31B — Quantized by BatiAI

Quantized from official Google weights. Verified on real Mac hardware.

Models

Tag Size VRAM M4 Pro (48GB) M4 Max (128GB) Use Case
iq4 (recommended) 16GB 26GB 13.5 t/s 22.8 t/s 48GB+ Mac, best speed+quality
iq3 13GB ~24GB 12.2 t/s 20.7 t/s 48GB+ Mac, slightly smaller
q4 17GB ~27GB 19.1 t/s 48GB+ Mac, standard
q6 23GB ~32GB ❌ tight 6.6 t/s 64GB+ Mac only

Quick Start

ollama run batiai/gemma4-31b:iq4

Why IQ4_XS is Best

Same as 26B — imatrix optimization makes IQ4 both smaller and faster than Q4_K_M:

IQ4_XS Q4_K_M
Size 16GB 17GB
VRAM 41GB 43GB
Speed 22.8 t/s 19.1 t/s
Quality 4-bit imatrix 4-bit standard

RAM Requirements — Honest

Your Mac RAM IQ3 (13GB) IQ4 (16GB) Q4 (17GB) Q6 (23GB)
16GB
32GB ❌ swap ❌ swap ❌ swap
48GB 12.2 t/s 13.5 t/s ⚠️ tight
64GB ✅ Fast ✅ Fast ✅ Fast ⚠️ Tight
128GB 20.7 t/s 22.8 t/s 19.1 t/s 6.6 t/s*

*Q6_K on 128GB Mac runs slow due to memory bandwidth limits, not VRAM.

31B vs 26B on M4 Pro 48GB — Real Numbers

We measured both models on the same 48GB Mac:

Metric 31B IQ4 26B IQ4 (MoE)
Speed 13.5 t/s 58–63 t/s (4x faster)
VRAM 26.1 GB (37% free) 15.1 GB (58% free)
Cold start 40 seconds 1.7 seconds
Simple response 1.5s 0.4s
Coding task 28.5s 6.8s

26B MoE wins on every axis for 48GB Mac. Use 31B only if you specifically need its higher quality on complex reasoning tasks (and have 64GB+ for comfortable headroom).

Smaller Macs — Use These Instead

# 16GB Mac
ollama run batiai/gemma4-e4b:q4     # 57.1 t/s, 10GB VRAM

# 24~48GB Mac (recommended for most users)
ollama run batiai/gemma4-26b:iq4    # 58-63 t/s on 48GB Mac, MoE architecture

Why BatiAI?

  • Quantized directly from official Google weights (not third-party)
  • imatrix optimized (IQ3, IQ4) for best quality at each size
  • Third-party GGUFs (unsloth) fail on Ollama 0.20+ — ours work
  • Verified on MacBook Pro M4 Max (128GB)
  • Korean, tool calling, JSON generation all tested

Built for BatiFlow

Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.

https://flow.bati.ai