53 Downloads Updated 1 week ago
ollama pull batiai/qwen3-embedding:4b-q6
Text embedding models for semantic search + RAG. Three size tiers, one namespace. Quantized directly from Alibaba’s BF16 safetensors (not re-quantized), general.author: BatiAI signed. Part of BatiAI’s Mac-first RAG stack.
| Tag | Quant | Size | Target Mac |
|---|---|---|---|
0.6b |
Q8_0 (default) | 610 MB | every Mac (8 GB+) |
0.6b-q6 |
Q6_K | 472 MB | smaller footprint |
4b |
Q8_0 (default) | 4.28 GB | 16 GB+ Mac (sweet spot) |
4b-q6 |
Q6_K | 3.31 GB | tighter disk |
8b |
Q8_0 (default) | 8.05 GB | 24 GB+ Mac (top quality) |
8b-q6 |
Q6_K | 6.21 GB | 24 GB Mac friendly |
# Default tier — lightweight, runs on every Mac
ollama pull batiai/qwen3-embedding:0.6b
# Balanced mid tier — 16 GB+ Macs
ollama pull batiai/qwen3-embedding:4b
# Top tier — MTEB #1 open embedder at release
ollama pull batiai/qwen3-embedding:8b
curl http://localhost:11434/api/embeddings -d '{
"model": "batiai/qwen3-embedding:4b",
"prompt": "semantic search query"
}'
Qwen3-Embedding recommended prompt format — best retrieval quality:
# Query side (use instruction prefix)
prompt = "Instruct: Given a document query, retrieve the most relevant chunk.\n" \
"Query: " + user_input
# Document side (raw text, no prefix)
prompt = document_chunk
Matryoshka — models output up to 2560 dimensions. Truncate at read time for faster search (default recommendation: 1024 dims). No re-embed needed.
Four-stage verification harness, EN + KO balanced, 120 items total.
| Model | Same-lang directional | Cross-lingual separation Δ | Real-doc top-1 (EN / KO) | Q8 ↔ Q6 drift avg |
|---|---|---|---|---|
| 0.6B (Q6) | 30⁄30 (100 %) | 0.521 | 95 % / 100 % | 0.9967 |
| 4B (Q6) | 30⁄30 (100 %) | 0.540 | 95 % / 100 % | 0.9984 |
| 8B (Q6) | 30⁄30 (100 %) | 0.569 | 100 % / 100 % | 0.9988 |
Cleanly monotonic improvement with size. 8B achieves perfect top-1 retrieval on the real-document testset for both English and Korean — full value of MTEB #1 visible on realistic business-doc workloads.
Full testset + harness: scripts/bench-embedding-quality.sh in the publisher repo.
general.author: BatiAI metadata for provenanceuser query
↓ batiai/qwen3-embedding:{0.6b|4b|8b} ← text embedder
1024-dim vector
↓ vector DB (sqlite-vec / LanceDB)
top-K candidates
↓ Qwen3-Reranker (via llama-server)
top-3
↓ batiai/qwen3.6-35b:iq4 ← chat LLM
answer
All components in the batiai/ Ollama namespace (or HF org). Direct-from-source, consistent provenance.
Use the VL variant: batiai/qwen3-vl-embed-2b / :8b. Text-only embedders here are lighter + more accurate for plain text workloads.
Full BatiAI RAG Stack collection →
Apache 2.0 — commercial use permitted. BatiAI’s quantization pipeline is MIT.