Llama 4 Scout 17B-16E-Instruct — Quantized by BatiAI

Meta’s agentic multimodal MoE in a 109B/17B-active package. imatrix-calibrated GGUF quantizations of the official meta-llama/Llama-4-Scout-17B-16E-Instruct (Llama 4 Community License), released 2025-04 by Meta. Free, unlimited, on-device AI for Mac via BatiFlow.

Multimodal-capable (vision) via separate mmproj on Hugging Face. Ollama ships text-only.

Available tags

Tag	Size	Min RAM	Use case
`:iq3`	41 GB	48 GB	Smallest footprint — M4 Max 64GB OK
`:q3`	51 GB	56 GB	K-quant alt for iq3
`:iq4`	57 GB	64 GB	Best size/quality ratio
`:q4`	65 GB	72 GB	Recommended for M4 Max 128GB users
`:q5`	76 GB	88 GB	Higher fidelity
`:q6`	88 GB	96 GB	Near-original quality

All 6 are imatrix-calibrated (wikitext-2-raw). Llama 4 Scout supports tools + extended context.

Quick Start

ollama pull batiai/llama4-scout:q4
ollama run  batiai/llama4-scout:q4

Why Llama 4 Scout?

Meta’s first Llama 4 family release. Mixture-of-Experts (16 experts × ~6B each) with single-expert routing — 17B active per token for inference efficiency.

109B total / 17B active MoE — efficient sparse computation
16 experts, top-1 routing — predictable inference cost
Multimodal native — text + vision via mmproj
Multilingual — 8 official languages + general multilingual
Native tool calling + extended context
Llama 4 Community License — commercial-friendly for orgs with < 700M MAU

RAM guide

Your Mac	`:iq3` 41G	`:q3` 51G	`:iq4` 57G	`:q4` 65G	`:q5` 76G	`:q6` 88G
16 GB	❌	❌	❌	❌	❌	❌
32 GB	❌	❌	❌	❌	❌	❌
64 GB	✅ tight	❌	❌	❌	❌	❌
96 GB	✅	✅ tight	✅ tight	❌	❌	❌
128 GB (M4 Max)	✅	✅	✅	✅	⚠ tight	❌
192 GB (M2 Ultra)	✅	✅	✅	✅	✅	✅
512 GB (M3 Ultra)	✅	✅	✅	✅	✅	✅ comfortable

Mac mini 16GB / 24GB sweet spot: not this model — use batiai/fara-7b (Microsoft Fara 7B, also multimodal) or batiai/qwen3.6-27b.

M4 Max 128GB recommendation: :q4 (65 GB) is the sweet spot — full quality with room for context.

Vision usage (multimodal)

Ollama ships text-only. For image input, use llama.cpp with the separate mmproj:

hf download batiai/Llama-4-Scout-17B-16E-Instruct-GGUF \
    --include "*Q4_K_M*" --include "mmproj-*-Q6_K.gguf" \
    --local-dir ./llama4-scout

llama-mtmd-cli \
    -m ./llama4-scout/meta-llama-Llama-4-Scout-17B-16E-Instruct-Q4_K_M.gguf \
    --mmproj ./llama4-scout/mmproj-meta-llama-Llama-4-Scout-17B-16E-Instruct-Q6_K.gguf \
    --image input.jpg \
    -p "Describe this image."

Why BatiAI?

Quantized directly from official Meta BF16 weights — no re-quantization
IQ + K-quant variants share the same wikitext-2-raw imatrix recipe as every BatiAI model
Multimodal mmproj packaged together on Hugging Face for one-stop usage
BatiAI metadata signed (general.author=BatiAI, general.url=https://flow.bati.ai)

License

Inherits Meta Llama 4 Community License. Commercial-friendly for orgs with < 700M MAU. - Llama 4 License - Acceptable Use Policy

Built for BatiFlow

flow.bati.ai — free, on-device AI automation for Mac. 5 MB app, 100 % local, unlimited.

Models

Readme