batiai/ qwen3-vl-embed-2b:q6

49 Downloads Updated 1 week ago

ollama run batiai/qwen3-vl-embed-2b:q6

curl http://localhost:11434/api/chat \
  -d '{
    "model": "batiai/qwen3-vl-embed-2b:q6",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='batiai/qwen3-vl-embed-2b:q6',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'batiai/qwen3-vl-embed-2b:q6',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 week ago

1 week ago

0f4fcd1b54a3 · 1.4GB ·

model

archqwen3vl

·

parameters1.72B

·

quantizationQ6_K

1.4GB

params

{ "num_ctx": 32768 }

18B

Readme

Qwen3-VL-Embedding-2B — Quantized by BatiAI

GGUF quantization of Qwen/Qwen3-VL-Embedding-2B — the most-downloaded vision-language embedding model of 2026 (1.64 M downloads on HF).

Part of BatiAI’s on-device RAG stack for BatiFlow.

Models

Tag	Quant	Size	Recommended For
`:q6`	Q6_K	~1.5 GB	balanced (recommended default)
`:q8`	Q8_0	~1.8 GB	near-lossless, best for retrieval quality

Quick Start

Text embedding

ollama pull batiai/qwen3-vl-embed-2b:q8

curl http://localhost:11434/api/embeddings -d '{
  "model": "batiai/qwen3-vl-embed-2b:q8",
  "prompt": "What is the capital of France?"
}'

Returns a 2048-dim vector you can store in sqlite-vss / LanceDB / pgvector / any vector DB.

Image embedding

Image support requires llama.cpp’s mtmd multimodal build (not standard Ollama). See upstream Qwen3-VL-Embedding docs.

What is this for?

Embedding turns text (or images) into dense vectors. Same vector space = semantic search.

Use cases: - Semantic note search — “last quarter’s meeting notes about deadlines” finds relevant notes without exact keyword match - Tool/command autocomplete — match natural language to API functions via embedding similarity - Cross-modal RAG — search photos by text, search notes by screenshot - Deduplication — find near-duplicate content

Why Qwen3-VL-Embedding?

SOTA on MTEB — top multilingual embedding model
Multilingual — en / ko / ja / zh (great for Korean semantic search)
Multimodal — text and image in the same embedding space
2048-dim vectors — balance between expressiveness and storage

Why BatiAI?

Quantized directly from Alibaba’s BF16 safetensors (not re-quantized)
Part of a full on-device RAG stack — chat LLM + reranker + embedding
BatiAI-signed metadata
Apache 2.0

BatiAI’s RAG Stack on Ollama

# Embedding (this model)
ollama pull batiai/qwen3-vl-embed-2b:q8

# Chat LLM
ollama pull batiai/qwen3.6-35b:iq4

# Reranker (HF only — Ollama doesn't support reranker endpoint yet)
# See: https://huggingface.co/batiai/Qwen3-Reranker-0.6B-GGUF

About BatiFlow

flow.bati.ai — macOS-native AI automation app. 5 MB. 100 % local. Uses BatiAI’s quantized models for on-device RAG — semantic search over notes, photos, and more.

# Qwen3-VL-Embedding-2B — Quantized by BatiAI

GGUF quantization of **Qwen/Qwen3-VL-Embedding-2B** — the most-downloaded vision-language embedding model of 2026 (1.64 M downloads on HF).

Part of BatiAI's on-device RAG stack for [BatiFlow](https://flow.bati.ai).

## Models

| Tag | Quant | Size | Recommended For |
|-----|-------|------|-----------------|
| `:q6` | Q6_K | ~1.5 GB | balanced (recommended default) |
| **`:q8`** | **Q8_0** | **~1.8 GB** | **near-lossless, best for retrieval quality** |

## Quick Start

### Text embedding

```bash
ollama pull batiai/qwen3-vl-embed-2b:q8

curl http://localhost:11434/api/embeddings -d '{
  "model": "batiai/qwen3-vl-embed-2b:q8",
  "prompt": "What is the capital of France?"
}'
```

Returns a **2048-dim vector** you can store in sqlite-vss / LanceDB / pgvector / any vector DB.

### Image embedding

Image support requires llama.cpp's `mtmd` multimodal build (not standard Ollama). See upstream [Qwen3-VL-Embedding docs](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B).

## What is this for?

**Embedding** turns text (or images) into dense vectors. Same vector space = semantic search.

Use cases:
- **Semantic note search** — "last quarter's meeting notes about deadlines" finds relevant notes without exact keyword match
- **Tool/command autocomplete** — match natural language to API functions via embedding similarity
- **Cross-modal RAG** — search photos by text, search notes by screenshot
- **Deduplication** — find near-duplicate content

## Why Qwen3-VL-Embedding?

- **SOTA on MTEB** — top multilingual embedding model
- **Multilingual** — en / ko / ja / zh (great for Korean semantic search)
- **Multimodal** — text and image in the same embedding space
- **2048-dim vectors** — balance between expressiveness and storage

## Why BatiAI?

- Quantized **directly from Alibaba's BF16 safetensors** (not re-quantized)
- Part of a **full on-device RAG stack** — chat LLM + reranker + embedding
- BatiAI-signed metadata
- Apache 2.0

## BatiAI's RAG Stack on Ollama

```bash
# Embedding (this model)
ollama pull batiai/qwen3-vl-embed-2b:q8

# Chat LLM
ollama pull batiai/qwen3.6-35b:iq4

# Reranker (HF only — Ollama doesn't support reranker endpoint yet)
# See: https://huggingface.co/batiai/Qwen3-Reranker-0.6B-GGUF
```

## About BatiFlow

[flow.bati.ai](https://flow.bati.ai) — macOS-native AI automation app. 5 MB. 100 % local.
Uses BatiAI's quantized models for on-device RAG — semantic search over notes, photos, and more.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)