55 1 week ago

ollama run batiai/qwen3-vl-embed-8b

Details

1 week ago

eb3c6bb330d5 · 6.2GB ·

qwen3vl
·
7.57B
·
Q6_K
{ "num_ctx": 32768 }
{{ .Prompt }}

Readme

Qwen3-VL-Embedding-8B — Quantized by BatiAI

GGUF quantization of Qwen/Qwen3-VL-Embedding-8B — the quality tier of Qwen3’s vision-language embedding family.

Part of BatiAI’s on-device RAG stack for BatiFlow.

Models

Tag Quant Size Recommended For
:q6 Q6_K ~5.8 GB balanced
:q8 Q8_0 ~7.5 GB near-lossless, best retrieval quality

When to pick 8B vs 2B?

Use case Pick
Workstation / desktop Mac, retrieval quality matters 8B — richer embeddings, sharper semantic separation
Laptop / Mac mini / latency matters 2B — 4× smaller, sufficient for most tasks

Quick Start

ollama pull batiai/qwen3-vl-embed-8b:q8

curl http://localhost:11434/api/embeddings -d '{
  "model": "batiai/qwen3-vl-embed-8b:q8",
  "prompt": "What is the capital of France?"
}'

Returns a 3584-dim float vector (wider than 2B’s 2048-dim). Store in sqlite-vss / LanceDB / pgvector / any vector DB.

Image embedding

Requires llama.cpp’s mtmd multimodal build — not standard Ollama. See upstream Qwen3-VL-Embedding-8B docs.

Why the VL Embedding?

VL (Vision-Language) means text + image share the same embedding space:

  • Search photos by text (“beach sunset”)
  • Search text by image (drop a screenshot)
  • Cross-modal RAG over PDFs + screenshots + notes
  • Deduplication / semantic clustering

Why BatiAI?

  • Quantized directly from Alibaba’s BF16 safetensors (not re-quantized)
  • BatiAI-signed metadata (general.author: BatiAI)
  • Part of the complete BatiAI RAG stack — chat LLM + reranker + embedding

Quality note

Our sibling reranker card measured Q6_K ↔ Q8_0 Pearson correlation r = 0.9986 on 40 hard-negative triples — quantization drift under noise floor. We expect this embedding model to behave similarly. MTEB/BEIR numbers to be added.

BatiAI’s RAG Stack on Ollama

# Embedding (this model)
ollama pull batiai/qwen3-vl-embed-8b:q8

# 또는 더 작은 2B
ollama pull batiai/qwen3-vl-embed-2b:q8

# Chat LLM
ollama pull batiai/qwen3.6-35b:iq4

# Reranker (HF only — Ollama doesn't support /rerank yet)
# https://huggingface.co/batiai/Qwen3-Reranker-8B-GGUF

About BatiFlow

flow.bati.ai — macOS-native AI automation app. 5 MB. 100 % local. Uses BatiAI’s quantized models for on-device RAG — semantic search over notes, photos, and more.