batiai/ qwen3-vl-embed-8b:latest

55 Downloads Updated 1 week ago

ollama run batiai/qwen3-vl-embed-8b

curl http://localhost:11434/api/chat \
  -d '{
    "model": "batiai/qwen3-vl-embed-8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='batiai/qwen3-vl-embed-8b',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'batiai/qwen3-vl-embed-8b',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 week ago

1 week ago

eb3c6bb330d5 · 6.2GB ·

model

archqwen3vl

·

parameters7.57B

·

quantizationQ6_K

6.2GB

params

{ "num_ctx": 32768 }

18B

template

{{ .Prompt }}

13B

Readme

Qwen3-VL-Embedding-8B — Quantized by BatiAI

GGUF quantization of Qwen/Qwen3-VL-Embedding-8B — the quality tier of Qwen3’s vision-language embedding family.

Part of BatiAI’s on-device RAG stack for BatiFlow.

Models

Tag	Quant	Size	Recommended For
`:q6`	Q6_K	~5.8 GB	balanced
`:q8`	Q8_0	~7.5 GB	near-lossless, best retrieval quality

When to pick 8B vs 2B?

Use case	Pick
Workstation / desktop Mac, retrieval quality matters	8B — richer embeddings, sharper semantic separation
Laptop / Mac mini / latency matters	2B — 4× smaller, sufficient for most tasks

Quick Start

ollama pull batiai/qwen3-vl-embed-8b:q8

curl http://localhost:11434/api/embeddings -d '{
  "model": "batiai/qwen3-vl-embed-8b:q8",
  "prompt": "What is the capital of France?"
}'

Returns a 3584-dim float vector (wider than 2B’s 2048-dim). Store in sqlite-vss / LanceDB / pgvector / any vector DB.

Image embedding

Requires llama.cpp’s mtmd multimodal build — not standard Ollama. See upstream Qwen3-VL-Embedding-8B docs.

Why the VL Embedding?

VL (Vision-Language) means text + image share the same embedding space:

Search photos by text (“beach sunset”)
Search text by image (drop a screenshot)
Cross-modal RAG over PDFs + screenshots + notes
Deduplication / semantic clustering

Why BatiAI?

Quantized directly from Alibaba’s BF16 safetensors (not re-quantized)
BatiAI-signed metadata (general.author: BatiAI)
Part of the complete BatiAI RAG stack — chat LLM + reranker + embedding

Quality note

Our sibling reranker card measured Q6_K ↔ Q8_0 Pearson correlation r = 0.9986 on 40 hard-negative triples — quantization drift under noise floor. We expect this embedding model to behave similarly. MTEB/BEIR numbers to be added.

BatiAI’s RAG Stack on Ollama

# Embedding (this model)
ollama pull batiai/qwen3-vl-embed-8b:q8

# 또는 더 작은 2B
ollama pull batiai/qwen3-vl-embed-2b:q8

# Chat LLM
ollama pull batiai/qwen3.6-35b:iq4

# Reranker (HF only — Ollama doesn't support /rerank yet)
# https://huggingface.co/batiai/Qwen3-Reranker-8B-GGUF

About BatiFlow

flow.bati.ai — macOS-native AI automation app. 5 MB. 100 % local. Uses BatiAI’s quantized models for on-device RAG — semantic search over notes, photos, and more.

# Qwen3-VL-Embedding-8B — Quantized by BatiAI

GGUF quantization of **Qwen/Qwen3-VL-Embedding-8B** — the quality tier of Qwen3's vision-language embedding family.

Part of BatiAI's on-device RAG stack for [BatiFlow](https://flow.bati.ai).

## Models

| Tag | Quant | Size | Recommended For |
|-----|-------|------|-----------------|
| `:q6` | Q6_K | ~5.8 GB | balanced |
| **`:q8`** | **Q8_0** | **~7.5 GB** | **near-lossless, best retrieval quality** |

## When to pick 8B vs 2B?

| Use case | Pick |
|----------|------|
| Workstation / desktop Mac, retrieval quality matters | **8B** — richer embeddings, sharper semantic separation |
| Laptop / Mac mini / latency matters | 2B — 4× smaller, sufficient for most tasks |

## Quick Start

```bash
ollama pull batiai/qwen3-vl-embed-8b:q8

curl http://localhost:11434/api/embeddings -d '{
  "model": "batiai/qwen3-vl-embed-8b:q8",
  "prompt": "What is the capital of France?"
}'
```

Returns a **3584-dim** float vector (wider than 2B's 2048-dim). Store in sqlite-vss / LanceDB / pgvector / any vector DB.

### Image embedding

Requires llama.cpp's `mtmd` multimodal build — not standard Ollama. See upstream [Qwen3-VL-Embedding-8B docs](https://huggingface.co/Qwen/Qwen3-VL-Embedding-8B).

## Why the VL Embedding?

**VL (Vision-Language)** means **text + image share the same embedding space**:

- Search photos by text ("beach sunset")
- Search text by image (drop a screenshot)
- Cross-modal RAG over PDFs + screenshots + notes
- Deduplication / semantic clustering

## Why BatiAI?

- Quantized **directly from Alibaba's BF16 safetensors** (not re-quantized)
- **BatiAI-signed** metadata (`general.author: BatiAI`)
- Part of the complete BatiAI RAG stack — chat LLM + reranker + embedding

## Quality note

Our sibling [reranker card](https://huggingface.co/batiai/Qwen3-Reranker-8B-GGUF) measured Q6_K ↔ Q8_0 Pearson correlation **r = 0.9986** on 40 hard-negative triples — quantization drift under noise floor. We expect this embedding model to behave similarly. MTEB/BEIR numbers to be added.

## BatiAI's RAG Stack on Ollama

```bash
# Embedding (this model)
ollama pull batiai/qwen3-vl-embed-8b:q8

# 또는 더 작은 2B
ollama pull batiai/qwen3-vl-embed-2b:q8

# Chat LLM
ollama pull batiai/qwen3.6-35b:iq4

# Reranker (HF only — Ollama doesn't support /rerank yet)
# https://huggingface.co/batiai/Qwen3-Reranker-8B-GGUF
```

## About BatiFlow

[flow.bati.ai](https://flow.bati.ai) — macOS-native AI automation app. 5 MB. 100 % local.
Uses BatiAI's quantized models for on-device RAG — semantic search over notes, photos, and more.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)