49 Downloads Updated 1 week ago
ollama run batiai/qwen3-vl-embed-2b:q6
GGUF quantization of Qwen/Qwen3-VL-Embedding-2B — the most-downloaded vision-language embedding model of 2026 (1.64 M downloads on HF).
Part of BatiAI’s on-device RAG stack for BatiFlow.
| Tag | Quant | Size | Recommended For |
|---|---|---|---|
:q6 |
Q6_K | ~1.5 GB | balanced (recommended default) |
:q8 |
Q8_0 | ~1.8 GB | near-lossless, best for retrieval quality |
ollama pull batiai/qwen3-vl-embed-2b:q8
curl http://localhost:11434/api/embeddings -d '{
"model": "batiai/qwen3-vl-embed-2b:q8",
"prompt": "What is the capital of France?"
}'
Returns a 2048-dim vector you can store in sqlite-vss / LanceDB / pgvector / any vector DB.
Image support requires llama.cpp’s mtmd multimodal build (not standard Ollama). See upstream Qwen3-VL-Embedding docs.
Embedding turns text (or images) into dense vectors. Same vector space = semantic search.
Use cases: - Semantic note search — “last quarter’s meeting notes about deadlines” finds relevant notes without exact keyword match - Tool/command autocomplete — match natural language to API functions via embedding similarity - Cross-modal RAG — search photos by text, search notes by screenshot - Deduplication — find near-duplicate content
# Embedding (this model)
ollama pull batiai/qwen3-vl-embed-2b:q8
# Chat LLM
ollama pull batiai/qwen3.6-35b:iq4
# Reranker (HF only — Ollama doesn't support reranker endpoint yet)
# See: https://huggingface.co/batiai/Qwen3-Reranker-0.6B-GGUF
flow.bati.ai — macOS-native AI automation app. 5 MB. 100 % local. Uses BatiAI’s quantized models for on-device RAG — semantic search over notes, photos, and more.