Kimi K2.6 — Quantized by BatiAI

Frontier 1T MoE from Moonshot AI, quantized directly from official FP8 weights.

Models

Tag	Size	Min RAM	Target Hardware
q5	728GB	768GB	2× M3 Ultra 512GB / 8× A100 80GB / H100 node — highest quality
iq4	546GB	512GB	M3 Ultra 512GB / 8× A100 80GB / H100 node — recommended
iq3	394GB	384GB	M3 Ultra 512GB / H100 node — most accessible

Quick Start

ollama run batiai/kimi-k2.6:iq4    # recommended balance
ollama run batiai/kimi-k2.6:iq3    # smaller, fits 384GB+ RAM
ollama run batiai/kimi-k2.6:q5     # highest quality, needs 768GB+ RAM

Kimi K2.6 — Why It Matters

1T parameters / 32B active — frontier-class open weight model
SWE-Bench Pro 58.6 — beats GPT-5.4 xhigh (57.7), Claude Opus 4.6 max (53.4), Gemini 3.1 Pro (54.2)
HLE 36.4% / 55.5% (w/ tools) — Humanity’s Last Exam frontier tier
256K native context via YARN scaling
Agent swarm — 300 sub-agents, 4,000 coordinated steps
Modified-MIT license — commercial + redistribution allowed
Released 2026-04-20 by Moonshot AI

Hardware Reality — Be Honest

Your System	IQ3 (394GB)	IQ4 (546GB)	Q5 (728GB)
Mac 16GB	❌	❌	❌
Mac 128GB	❌	❌	❌
Mac 256GB	⚠️ heavy swap	❌	❌
Mac 384GB	⚠️ tight	❌	❌
Mac M3 Ultra 512GB	✅	✅ tight	❌
2× M3 Ultra (cluster)	✅	✅	✅
8× A100 80GB	✅	✅	✅
H100 node	✅ fast	✅ fast	✅ fast

This is not a consumer Mac model. For on-device Mac use, see below.

For Smaller Macs — BatiAI Lineup

Your Mac	Recommended
16GB	`batiai/gemma4-e4b:q4`
24GB	`batiai/gemma4-26b:iq4`
48GB	`batiai/qwen3.5-35b:iq4`
96GB	`batiai/qwen3.6-35b:iq4`
128GB	`batiai/minimax-m2.7:iq3`
M3 Ultra 512GB+	`batiai/kimi-k2.6:iq4` (this model)

Why BatiAI?

Quantized directly from official Moonshot FP8 weights (not 3rd-party re-quant)
imatrix calibration with 200 chunks (quality saturation point)
Full general.author=BatiAI / general.url=https://flow.bati.ai signature
Open pipeline — see docs/202604-large-moe-quantization.md
Handles 1T+ MoE models (most providers stop at 70B)

Built for BatiFlow (plus frontier research)

BatiFlow is our on-device Mac AI automation app (free, unlimited, local). The smaller models in our lineup (gemma4, qwen3.5-35b, qwen3.6, minimax-m2.7) serve BatiFlow users directly.

Kimi K2.6 is different — it’s a frontier research / workstation model, beyond consumer hardware reach. We quantize it to demonstrate the pipeline handles the full frontier and for researchers / teams with workstation-class GPUs.

Details

Readme