185 Downloads Updated yesterday
ollama run batiai/gemma4-31b:iq3
Quantized from official Google weights. Verified on real Mac hardware.
| Tag | Size | VRAM | M4 Pro (48GB) | M4 Max (128GB) | Use Case |
|---|---|---|---|---|---|
| iq4 (recommended) | 16GB | 26GB | 13.5 t/s | 22.8 t/s | 48GB+ Mac, best speed+quality |
| iq3 | 13GB | ~24GB | 12.2 t/s | 20.7 t/s | 48GB+ Mac, slightly smaller |
| q4 | 17GB | ~27GB | — | 19.1 t/s | 48GB+ Mac, standard |
| q6 | 23GB | ~32GB | ❌ tight | 6.6 t/s | 64GB+ Mac only |
ollama run batiai/gemma4-31b:iq4
Same as 26B — imatrix optimization makes IQ4 both smaller and faster than Q4_K_M:
| IQ4_XS | Q4_K_M | |
|---|---|---|
| Size | 16GB | 17GB |
| VRAM | 41GB | 43GB |
| Speed | 22.8 t/s | 19.1 t/s |
| Quality | 4-bit imatrix | 4-bit standard |
| Your Mac RAM | IQ3 (13GB) | IQ4 (16GB) | Q4 (17GB) | Q6 (23GB) |
|---|---|---|---|---|
| 16GB | ❌ | ❌ | ❌ | ❌ |
| 32GB | ❌ swap | ❌ swap | ❌ swap | ❌ |
| 48GB | 12.2 t/s | 13.5 t/s ✅ | ⚠️ tight | ❌ |
| 64GB | ✅ Fast | ✅ Fast | ✅ Fast | ⚠️ Tight |
| 128GB | 20.7 t/s | 22.8 t/s | 19.1 t/s | 6.6 t/s* |
*Q6_K on 128GB Mac runs slow due to memory bandwidth limits, not VRAM.
We measured both models on the same 48GB Mac:
| Metric | 31B IQ4 | 26B IQ4 (MoE) |
|---|---|---|
| Speed | 13.5 t/s | 58–63 t/s (4x faster) |
| VRAM | 26.1 GB (37% free) | 15.1 GB (58% free) |
| Cold start | 40 seconds | 1.7 seconds |
| Simple response | 1.5s | 0.4s |
| Coding task | 28.5s | 6.8s |
26B MoE wins on every axis for 48GB Mac. Use 31B only if you specifically need its higher quality on complex reasoning tasks (and have 64GB+ for comfortable headroom).
# 16GB Mac
ollama run batiai/gemma4-e4b:q4 # 57.1 t/s, 10GB VRAM
# 24~48GB Mac (recommended for most users)
ollama run batiai/gemma4-26b:iq4 # 58-63 t/s on 48GB Mac, MoE architecture
Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.