3,341 Downloads Updated 1 month ago
ollama run batiai/gemma4-26b:iq3
Quantized directly from official Google BF16 weights. MoE design: 26 B total parameters, ~3.8 B active per token. Text-only here on Ollama; multimodal (vision: image + video) opt-in via HF + llama.cpp (see bottom).
| Tag | Size | VRAM | M4 Pro (48GB) | M4 Max (128GB) | Use Case |
|---|---|---|---|---|---|
| iq4 | 13GB | 15GB | 58–63 t/s ✅ | 85.8 t/s | 24GB+ Mac, recommended |
| iq3 | 12GB | 14GB | — | 77 t/s | 24GB Mac, slightly smaller |
| q3 | 13GB | 15GB | — | 70.7 t/s | 24GB Mac, standard |
| q4 | 16GB | 18GB | — | 74.9 t/s | 32GB+ Mac |
| q6 | 21GB | 24GB | 48–50 t/s | 74.8 t/s | 36GB+ Mac, highest quality |
ollama run batiai/gemma4-26b:iq4
IQ4 uses importance-matrix quantization: calibration data tells which weights matter most, compressing aggressively where it doesn’t matter.
| IQ4_XS (BatiAI) | Q4_K_M (standard) | |
|---|---|---|
| Size | 13GB | 16GB |
| Speed (M4 Pro 48GB) | 58–63 t/s | — |
| Speed (M4 Max 128GB) | 85.8 t/s | 74.9 t/s |
| Quality | 4-bit imatrix | 4-bit standard |
Same 4-bit quality, 3GB smaller file. Verified with translation, tool calling, and math reasoning — identical output quality.
Measured on real Mac hardware (M4 Pro, 48GB unified memory):
| Model | Size | VRAM | Speed | Cold start | System free |
|---|---|---|---|---|---|
| BatiAI 26B IQ4 | 13GB | 15.1GB | 58–63 t/s | 1.7s | 58% |
| BatiAI 26B Q6 | 21GB | 23.9GB | 48–50 t/s | 5.8s | 40% |
| Ollama 26B (official) | 14GB | 19.3GB | 56 t/s | 3.4s | 50% |
| 31B IQ4 (Dense) | 16GB | 26.1GB | 13.5 t/s | 40s | 37% |
Key findings on 48GB Mac: - BatiAI IQ4 is faster than Ollama’s official 26B (58-63 vs 56 t/s) - 4x faster than 31B Dense with similar quality - Fastest cold start (1.7s) — imatrix 4-bit loads cleanest on Apple Silicon - Most system memory free (58%) — best for multitasking
Counter-intuitively, IQ4 (13GB) is faster than IQ3 (12GB) on M-series chips:
Smaller file ≠ faster on Apple Silicon when it comes to 3-bit vs 4-bit.
| Your Mac RAM | IQ3 (12GB) | IQ4 (13GB) | Q3 (13GB) | Q4 (16GB) | Q6 (21GB) |
|---|---|---|---|---|---|
| 16GB | ❌ swap | ❌ swap | ❌ swap | ❌ Won’t fit | ❌ Won’t fit |
| 24GB | ✅ Fast | ✅ Fits | ⚠️ Tight | ❌ Barely | ❌ No |
| 32GB | ✅ Fast | ✅ Fast | ✅ Fast | ✅ OK | ❌ No |
| 36GB+ | ✅ Fast | ✅ Fast | ✅ Fast | ✅ Fast | ✅ Fits |
| 128GB | 77 t/s | 85.8 t/s | 70.7 t/s | 74.9 t/s | 74.8 t/s |
26B models don’t work on 16GB Mac. Use these instead:
ollama run batiai/gemma4-e4b # 57.1 t/s on 16GB Mac ✅
ollama run batiai/qwen3.5-9b # 12.5 t/s on 16GB Mac ✅
Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.
This Ollama tag is text-only — Ollama’s mmproj integration is still rough today. For image / video understanding, grab the main GGUF + the vision projector from HF and run with llama.cpp:
# Main model + vision projector
wget https://huggingface.co/batiai/Gemma-4-26B-A4B-it-GGUF/resolve/main/google-gemma-4-26B-A4B-it-IQ4_XS.gguf
wget https://huggingface.co/batiai/Gemma-4-26B-A4B-it-GGUF/resolve/main/mmproj-Q6_K.gguf
llama-server -m google-gemma-4-26B-A4B-it-IQ4_XS.gguf \
--mmproj mmproj-Q6_K.gguf -c 32768 --port 8080
Audio is NOT supported in 26B/31B (vision only). For audio, use batiai/gemma4-e2b or batiai/gemma4-e4b.