2,705 Downloads Updated 1 week ago
ollama run batiai/qwen3.6-35b:iq3
ollama launch claude --model batiai/qwen3.6-35b:iq3
ollama launch openclaw --model batiai/qwen3.6-35b:iq3
ollama launch hermes --model batiai/qwen3.6-35b:iq3
ollama launch codex --model batiai/qwen3.6-35b:iq3
ollama launch opencode --model batiai/qwen3.6-35b:iq3
Name
5 models
qwen3.6-35b:iq3
14GB · 256K context window · Text · 1 week ago
qwen3.6-35b:iq4
19GB · 256K context window · Text · 1 week ago
qwen3.6-35b:q3
14GB · 256K context window · Text · 1 week ago
qwen3.6-35b:q4
19GB · 256K context window · Text · 1 week ago
qwen3.6-35b:q6
29GB · 256K context window · Text · 1 week ago
“Agentic Coding Power, Now Open to All.” imatrix-calibrated quantizations of the official Qwen 3.6 35B-A3B MoE, released 2026-04-15 by Alibaba. Text-only, built directly from Alibaba BF16 weights.

Real on-device inference on M4 Max in three scenarios: 1. Q&A streaming — “5 tips for writing professional emails” at ~46 t/s 2. Code + file tools — Python regex function → save to file → reveal in Finder 3. Calendar — “Show me today’s schedule” → live Mac Calendar query and event add
All 100 % local through BatiFlow — one click, no code, no API keys, no subscription. Built so non-developers can use this kind of AI automation on their Mac.
| Tag | Size | Min RAM | Use Case |
|---|---|---|---|
:iq3 / :q3 |
13 GB | 16 GB | 16GB Mac mini / MacBook Air |
:iq4 / :q4 |
18 GB | 24 GB | MacBook Pro / Mac Studio (recommended) |
:q6 |
27 GB | 36 GB | MacBook Pro M4 Pro / Studio — max on-device quality |
IQ3/IQ4 are imatrix (wikitext-calibrated) for better quality per bit at low bit-widths. Q6_K is a high-bit K-quant — near-BF16 quality for users with enough RAM.
Tool calling: all tags support tools + thinking. Pass "think": false in chat requests for fast tool-call responses.
ollama pull batiai/qwen3.6-35b:iq4
ollama run batiai/qwen3.6-35b:iq4
Upstream positions 3.6 as a major agentic-coding upgrade over 3.5. Key numbers from Alibaba’s official BF16 benchmarks:
| Benchmark | 3.5-35B-A3B | 3.6-35B-A3B | Δ |
|---|---|---|---|
| SWE-bench Verified | 70.0 | 73.4 | +3.4 |
| Terminal-Bench 2.0 | 40.5 | 51.5 | +11.0 |
| QwenWebBench | 978 | 1397 | +43% |
| Benchmark | Gemma 4-31B | Qwen 3.6-35B-A3B |
|---|---|---|
| SWE-bench Verified | 52.0 | 73.4 |
| SWE-bench Multilingual | 51.7 | 67.2 |
| SWE-bench Pro | 35.7 | 49.5 |
| Terminal-Bench 2.0 | 42.9 | 51.5 |
| AIME26 | 89.2 | 92.7 |
Despite Gemma 4-31B being a similar-sized dense model, 3.6’s MoE architecture (3B active params) outpaces it on agentic coding and math while using 9× less compute per token.
qwen3_coder parser — works with Tools in BatiFlowUpstream BF16 figures. Quantization (IQ3/IQ4) may cost a few points on the hardest benchmarks — run --verbose locally to see real tokens/s on your Mac.
| Qwen3.6-35B-A3B (MoE) | Typical 27B Dense | |
|---|---|---|
| Total params | 35B | 27B |
| Active params | 3B | 27B |
| Experts | 256 (8 routed + 1 shared) | — |
| Typical VRAM (IQ4) | ~23 GB | ~28 GB |
| Your Mac RAM | IQ3 (13GB) | IQ4 (18GB) |
|---|---|---|
| 16 GB | ✅ tight | ❌ |
| 24 GB | ✅ | ✅ tight |
| 32 GB | ✅ | ✅ |
| 48 GB+ | ✅ | ✅ ideal |
MacBook Pro M4 Max (128 GB) — 100 % GPU
| Metric | IQ3_XXS | IQ4_XS |
|---|---|---|
| Gen speed (warm) | 45.9 t/s | 46.5 t/s |
| Prompt eval | 104.9 t/s | 105.0 t/s |
| Load time | 3.0 s | 5.3 s |
| Ollama RAM | 18 GB | 23 GB |
| Tool call JSON | ❌ fail | ✅ pass |
Mac mini M4 (16 GB) — IQ3 runs at ~2 – 3 t/s (swap pressure, single-turn only). IQ4 does not fit.
Reference: 2× RTX 6000 Ada (96 GB VRAM, Linux) — not our target, but useful as ceiling:
| Metric | IQ3 | IQ4 | Q6 |
|---|---|---|---|
| Gen speed (warm) | 133.0 | 115.4 | 112.3 t/s |
| Prompt eval | 722 | 666 | 516 t/s |
| VRAM | 18 | 23 | 33 GB |
Mac M4 Max reaches ~35–40 % of server’s warm throughput despite way less power — memory-bandwidth bound, not compute.
ollama run batiai/qwen3.6-35b:iq4 --verbose "Write a haiku about Seoul in autumn."
--verbose prints prompt-eval rate, token-gen rate, and memory.
Upstream Qwen 3.6 is multimodal — it has a vision tower (~1–2 GB extra) that handles images. Multimodal GGUFs need two files (main + mmproj.gguf), and Ollama’s mmproj integration is rough today.
We deliberately ship the text tower only:
- ✅ Single file, one ollama pull works out of the box
- ✅ Smaller disk / RAM footprint
- ✅ Covers everything BatiFlow needs — chat, code, tool calls, RAG
Need images (OCR, captioning, visual reasoning)? Use upstream weights directly. Need text+image embedding for RAG? See batiai/qwen3-vl-embed-2b.
Qwen released this publicly as 3.6, but the config still uses the architecture name Qwen3_5MoeForConditionalGeneration internally (transitional class name from the 3.5 line). llama.cpp converts via Qwen3_5MoeTextModel — text tower only.
flow.bati.ai — free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.