20 2 days ago

ollama run batiai/llama4-scout:iq4

Details

2 days ago

f035ee83017a · 58GB ·

llama4
·
108B
·
IQ4_XS
You are a helpful AI assistant.
{ "num_ctx": 131072, "stop": [ "<|eot_id|>", "<|end_of_text|>" ], "t

Readme

Llama 4 Scout 17B-16E-Instruct — Quantized by BatiAI

Meta’s agentic multimodal MoE in a 109B/17B-active package. imatrix-calibrated GGUF quantizations of the official meta-llama/Llama-4-Scout-17B-16E-Instruct (Llama 4 Community License), released 2025-04 by Meta. Free, unlimited, on-device AI for Mac via BatiFlow.

Multimodal-capable (vision) via separate mmproj on Hugging Face. Ollama ships text-only.

Available tags

Tag Size Min RAM Use case
:iq3 41 GB 48 GB Smallest footprint — M4 Max 64GB OK
:q3 51 GB 56 GB K-quant alt for iq3
:iq4 57 GB 64 GB Best size/quality ratio
:q4 65 GB 72 GB Recommended for M4 Max 128GB users
:q5 76 GB 88 GB Higher fidelity
:q6 88 GB 96 GB Near-original quality

All 6 are imatrix-calibrated (wikitext-2-raw). Llama 4 Scout supports tools + extended context.

Quick Start

ollama pull batiai/llama4-scout:q4
ollama run  batiai/llama4-scout:q4

Why Llama 4 Scout?

Meta’s first Llama 4 family release. Mixture-of-Experts (16 experts × ~6B each) with single-expert routing — 17B active per token for inference efficiency.

  • 109B total / 17B active MoE — efficient sparse computation
  • 16 experts, top-1 routing — predictable inference cost
  • Multimodal native — text + vision via mmproj
  • Multilingual — 8 official languages + general multilingual
  • Native tool calling + extended context
  • Llama 4 Community License — commercial-friendly for orgs with < 700M MAU

RAM guide

Your Mac :iq3 41G :q3 51G :iq4 57G :q4 65G :q5 76G :q6 88G
16 GB
32 GB
64 GB ✅ tight
96 GB ✅ tight ✅ tight
128 GB (M4 Max) ⚠ tight
192 GB (M2 Ultra)
512 GB (M3 Ultra) ✅ comfortable

Mac mini 16GB / 24GB sweet spot: not this model — use batiai/fara-7b (Microsoft Fara 7B, also multimodal) or batiai/qwen3.6-27b.

M4 Max 128GB recommendation: :q4 (65 GB) is the sweet spot — full quality with room for context.

Vision usage (multimodal)

Ollama ships text-only. For image input, use llama.cpp with the separate mmproj:

hf download batiai/Llama-4-Scout-17B-16E-Instruct-GGUF \
    --include "*Q4_K_M*" --include "mmproj-*-Q6_K.gguf" \
    --local-dir ./llama4-scout

llama-mtmd-cli \
    -m ./llama4-scout/meta-llama-Llama-4-Scout-17B-16E-Instruct-Q4_K_M.gguf \
    --mmproj ./llama4-scout/mmproj-meta-llama-Llama-4-Scout-17B-16E-Instruct-Q6_K.gguf \
    --image input.jpg \
    -p "Describe this image."

Why BatiAI?

  • Quantized directly from official Meta BF16 weights — no re-quantization
  • IQ + K-quant variants share the same wikitext-2-raw imatrix recipe as every BatiAI model
  • Multimodal mmproj packaged together on Hugging Face for one-stop usage
  • BatiAI metadata signed (general.author=BatiAI, general.url=https://flow.bati.ai)

License

Inherits Meta Llama 4 Community License. Commercial-friendly for orgs with < 700M MAU. - Llama 4 License - Acceptable Use Policy

Built for BatiFlow

flow.bati.ai — free, on-device AI automation for Mac. 5 MB app, 100 % local, unlimited.