Qwen3-Embedding-8B-f16
F16 精度、8B 参数、32K 上下文;32–4096 维可调嵌入,适用于多语种检索与代码搜索。
简介 | Overview
- 中文:Qwen3-Embedding-8B-f16 来自 Qwen/Qwen3-Embedding-8B-GGUF 仓库的 F16 精度 8B 文本向量模型,支持 32K 上下文窗口与 32–4096 维可调嵌入,在多语种检索、分类、聚类、跨语匹配与代码检索等场景表现优异。
- English: Qwen3-Embedding-8B-f16 is an F16, 8B-parameter text embedding model sourced from Qwen/Qwen3-Embedding-8B-GGUF. It supports a 32K context window and configurable 32–4096-dimensional embeddings for multilingual retrieval, classification, clustering, cross-lingual alignment, and code search.
模型特性 | Key Features
- 多语种 / Multilingual:覆盖 100+ 语言;据 2025-06-05 的多语 MTEB 榜单为 70.58 分、排名靠前(以官方卡片为准)。
- 可调维度 / Configurable dims:默认 4096 维,可按需裁剪至 32 维以在质量与成本间取舍。
- 高保真 / High-fidelity:基于官方 GGUF F16 导出,便于与重排模型或各类向量库无缝集成。
- 开源许可 / License:Apache-2.0;部署与再分发请保留许可证声明。
使用说明 | Usage
安装并启动 Ollama / Start Ollama daemon
拉取模型 / Pull
ollama pull l284190056/Qwen3-Embedding-8B-f16
- 生成嵌入 / Generate embeddings
ollama embed -m l284190056/Qwen3-Embedding-8B-f16 "测试一下 / Try a quick test"
集成提示 / Integration hints
- 可与 Milvus、Qdrant、Weaviate 等向量数据库结合用于语义检索。
- 与 reranker(重排)模型配合以提升排序质量。
版本信息 | Version Notes
- 基于 Hugging Face 官方 GGUF F16 文件构建。
- Modelfile 中包含模型描述与 Apache-2.0 许可证文本。
参考资源 | References
Qwen3-Embedding-8B-f16
F16 precision • 8B parameters • 32K context window • 32–4096 configurable embedding dimensions
Overview
Qwen3-Embedding-8B-f16 is an F16, 8B-parameter text-embedding model sourced from the Qwen/Qwen3-Embedding-8B-GGUF repository. It supports a 32K context window and configurable 32–4096-dimensional embeddings, delivering strong performance for multilingual retrieval, classification, clustering, cross-lingual alignment, and code search.
Key Features
- Multilingual coverage across 100+ languages; strong results on the multilingual MTEB leaderboard (see the official model card for details).
- Configurable dimensions: default 4096-d, adjustable down to 32-d to balance quality and cost.
- High fidelity: built from the official GGUF F16 export for seamless pairing with rerankers and vector databases.
- License: Apache-2.0. Retain the license notice for deployments and redistribution.
Usage
Install and start Ollama (local or remote daemon).
Pull the model
ollama pull l284190056/Qwen3-Embedding-8B-f16
- Generate embeddings
ollama embed -m l284190056/Qwen3-Embedding-8B-f16 "Try a quick test"
Integration hints
- Connect to vector databases such as Milvus, Qdrant, or Weaviate for semantic search.
- Pair with a reranker to improve ranking quality on sensitive queries.
Version Notes
- Built from the official Hugging Face GGUF F16 artifact.
- The Modelfile includes descriptive metadata and the Apache-2.0 license text.
References