Details

Updated 3 months ago

3 months ago

2226bf6ca3ca · 23GB ·

model

archgemma4

parameters25.2B

quantizationQ6_K

23GB

system

You are a helpful AI assistant.

31B

params

{ "num_ctx": 131072, "stop": [ "<turn|>" ], "temperature": 0.7 }

66B

Gemma 4 26B-A4B-it — Quantized by BatiAI

Quantized directly from official Google BF16 weights. MoE design: 26 B total parameters, ~3.8 B active per token. Text-only here on Ollama; multimodal (vision: image + video) opt-in via HF + llama.cpp (see bottom).

Models

Tag	Size	VRAM	M4 Pro (48GB)	M4 Max (128GB)	Use Case
iq4	13GB	15GB	58–63 t/s ✅	85.8 t/s	24GB+ Mac, recommended
iq3	12GB	14GB	—	77 t/s	24GB Mac, slightly smaller
q3	13GB	15GB	—	70.7 t/s	24GB Mac, standard
q4	16GB	18GB	—	74.9 t/s	32GB+ Mac
q6	21GB	24GB	48–50 t/s	74.8 t/s	36GB+ Mac, highest quality

Quick Start

ollama run batiai/gemma4-26b:iq4

Why IQ4? — Fastest AND Smartest

IQ4 uses importance-matrix quantization: calibration data tells which weights matter most, compressing aggressively where it doesn’t matter.

	IQ4_XS (BatiAI)	Q4_K_M (standard)
Size	13GB	16GB
Speed (M4 Pro 48GB)	58–63 t/s	—
Speed (M4 Max 128GB)	85.8 t/s	74.9 t/s
Quality	4-bit imatrix	4-bit standard

Same 4-bit quality, 3GB smaller file. Verified with translation, tool calling, and math reasoning — identical output quality.

M4 Pro 48GB — Real User Benchmark

Measured on real Mac hardware (M4 Pro, 48GB unified memory):

Model	Size	VRAM	Speed	Cold start	System free
BatiAI 26B IQ4	13GB	15.1GB	58–63 t/s	1.7s	58%
BatiAI 26B Q6	21GB	23.9GB	48–50 t/s	5.8s	40%
Ollama 26B (official)	14GB	19.3GB	56 t/s	3.4s	50%
31B IQ4 (Dense)	16GB	26.1GB	13.5 t/s	40s	37%

Key findings on 48GB Mac: - BatiAI IQ4 is faster than Ollama’s official 26B (58-63 vs 56 t/s) - 4x faster than 31B Dense with similar quality - Fastest cold start (1.7s) — imatrix 4-bit loads cleanest on Apple Silicon - Most system memory free (58%) — best for multitasking

Why IQ4 beats IQ3 on Apple Silicon

Counter-intuitively, IQ4 (13GB) is faster than IQ3 (12GB) on M-series chips:

4-bit alignment — CPU/GPU processes 4-bit cleanly, SIMD-friendly
3-bit packing — misaligned, complex lookup tables, SIMD inefficient
Memory read savings < dequantize overhead → IQ4 wins

Smaller file ≠ faster on Apple Silicon when it comes to 3-bit vs 4-bit.

RAM Requirements — Be Honest

Your Mac RAM	IQ3 (12GB)	IQ4 (13GB)	Q3 (13GB)	Q4 (16GB)	Q6 (21GB)
16GB	❌ swap	❌ swap	❌ swap	❌ Won’t fit	❌ Won’t fit
24GB	✅ Fast	✅ Fits	⚠️ Tight	❌ Barely	❌ No
32GB	✅ Fast	✅ Fast	✅ Fast	✅ OK	❌ No
36GB+	✅ Fast	✅ Fast	✅ Fast	✅ Fast	✅ Fits
128GB	77 t/s	85.8 t/s	70.7 t/s	74.9 t/s	74.8 t/s

16GB Mac Users

26B models don’t work on 16GB Mac. Use these instead:

ollama run batiai/gemma4-e4b    # 57.1 t/s on 16GB Mac ✅
ollama run batiai/qwen3.5-9b    # 12.5 t/s on 16GB Mac ✅

Why BatiAI?

Quantized directly from official Google weights (not third-party)
imatrix optimized (IQ3, IQ4) for best quality at each size
Third-party GGUFs (unsloth) fail on Ollama 0.20+ — ours work
Verified on Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB)
Vision: mmproj available on HuggingFace (Ollama vision pending ecosystem fix)
Korean, tool calling, JSON generation all tested

Built for BatiFlow

Free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.

https://flow.bati.ai

Multimodal mode (opt-in, HF + llama.cpp)

This Ollama tag is text-only — Ollama’s mmproj integration is still rough today. For image / video understanding, grab the main GGUF + the vision projector from HF and run with llama.cpp:

# Main model + vision projector
wget https://huggingface.co/batiai/Gemma-4-26B-A4B-it-GGUF/resolve/main/google-gemma-4-26B-A4B-it-IQ4_XS.gguf
wget https://huggingface.co/batiai/Gemma-4-26B-A4B-it-GGUF/resolve/main/mmproj-Q6_K.gguf

llama-server -m google-gemma-4-26B-A4B-it-IQ4_XS.gguf \
  --mmproj mmproj-Q6_K.gguf -c 32768 --port 8080

Audio is NOT supported in 26B/31B (vision only). For audio, use batiai/gemma4-e2b or batiai/gemma4-e4b.

Gemma 4 26B MoE quantized by BatiAI. 77 t/s on M4 Max. Requires 24GB+ Mac.