55 Downloads Updated 1 week ago
ollama run batiai/qwen3-vl-embed-8b:q6
GGUF quantization of Qwen/Qwen3-VL-Embedding-8B — the quality tier of Qwen3’s vision-language embedding family.
Part of BatiAI’s on-device RAG stack for BatiFlow.
| Tag | Quant | Size | Recommended For |
|---|---|---|---|
:q6 |
Q6_K | ~5.8 GB | balanced |
:q8 |
Q8_0 | ~7.5 GB | near-lossless, best retrieval quality |
| Use case | Pick |
|---|---|
| Workstation / desktop Mac, retrieval quality matters | 8B — richer embeddings, sharper semantic separation |
| Laptop / Mac mini / latency matters | 2B — 4× smaller, sufficient for most tasks |
ollama pull batiai/qwen3-vl-embed-8b:q8
curl http://localhost:11434/api/embeddings -d '{
"model": "batiai/qwen3-vl-embed-8b:q8",
"prompt": "What is the capital of France?"
}'
Returns a 3584-dim float vector (wider than 2B’s 2048-dim). Store in sqlite-vss / LanceDB / pgvector / any vector DB.
Requires llama.cpp’s mtmd multimodal build — not standard Ollama. See upstream Qwen3-VL-Embedding-8B docs.
VL (Vision-Language) means text + image share the same embedding space:
general.author: BatiAI)Our sibling reranker card measured Q6_K ↔ Q8_0 Pearson correlation r = 0.9986 on 40 hard-negative triples — quantization drift under noise floor. We expect this embedding model to behave similarly. MTEB/BEIR numbers to be added.
# Embedding (this model)
ollama pull batiai/qwen3-vl-embed-8b:q8
# 또는 더 작은 2B
ollama pull batiai/qwen3-vl-embed-2b:q8
# Chat LLM
ollama pull batiai/qwen3.6-35b:iq4
# Reranker (HF only — Ollama doesn't support /rerank yet)
# https://huggingface.co/batiai/Qwen3-Reranker-8B-GGUF
flow.bati.ai — macOS-native AI automation app. 5 MB. 100 % local. Uses BatiAI’s quantized models for on-device RAG — semantic search over notes, photos, and more.