49 1 week ago

ollama run batiai/qwen3-vl-embed-2b

Models

View all →

Readme

Qwen3-VL-Embedding-2B — Quantized by BatiAI

GGUF quantization of Qwen/Qwen3-VL-Embedding-2B — the most-downloaded vision-language embedding model of 2026 (1.64 M downloads on HF).

Part of BatiAI’s on-device RAG stack for BatiFlow.

Models

Tag Quant Size Recommended For
:q6 Q6_K ~1.5 GB balanced (recommended default)
:q8 Q8_0 ~1.8 GB near-lossless, best for retrieval quality

Quick Start

Text embedding

ollama pull batiai/qwen3-vl-embed-2b:q8

curl http://localhost:11434/api/embeddings -d '{
  "model": "batiai/qwen3-vl-embed-2b:q8",
  "prompt": "What is the capital of France?"
}'

Returns a 2048-dim vector you can store in sqlite-vss / LanceDB / pgvector / any vector DB.

Image embedding

Image support requires llama.cpp’s mtmd multimodal build (not standard Ollama). See upstream Qwen3-VL-Embedding docs.

What is this for?

Embedding turns text (or images) into dense vectors. Same vector space = semantic search.

Use cases: - Semantic note search — “last quarter’s meeting notes about deadlines” finds relevant notes without exact keyword match - Tool/command autocomplete — match natural language to API functions via embedding similarity - Cross-modal RAG — search photos by text, search notes by screenshot - Deduplication — find near-duplicate content

Why Qwen3-VL-Embedding?

  • SOTA on MTEB — top multilingual embedding model
  • Multilingual — en / ko / ja / zh (great for Korean semantic search)
  • Multimodal — text and image in the same embedding space
  • 2048-dim vectors — balance between expressiveness and storage

Why BatiAI?

  • Quantized directly from Alibaba’s BF16 safetensors (not re-quantized)
  • Part of a full on-device RAG stack — chat LLM + reranker + embedding
  • BatiAI-signed metadata
  • Apache 2.0

BatiAI’s RAG Stack on Ollama

# Embedding (this model)
ollama pull batiai/qwen3-vl-embed-2b:q8

# Chat LLM
ollama pull batiai/qwen3.6-35b:iq4

# Reranker (HF only — Ollama doesn't support reranker endpoint yet)
# See: https://huggingface.co/batiai/Qwen3-Reranker-0.6B-GGUF

About BatiFlow

flow.bati.ai — macOS-native AI automation app. 5 MB. 100 % local. Uses BatiAI’s quantized models for on-device RAG — semantic search over notes, photos, and more.