111 8 months ago

from https://huggingface.co/Qwen/Qwen3-Embedding-8B-GGUF

embedding
ollama pull l284190056/Qwen3-Embedding-8B-f16

Details

8 months ago

9d6f5dc4851c · 15GB ·

qwen3
·
7.57B
·
F16
{{ .Input }}
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

Readme

Qwen3-Embedding-8B-f16

F16 精度、8B 参数、32K 上下文;32–4096 维可调嵌入,适用于多语种检索与代码搜索。

简介 | Overview

  • 中文:Qwen3-Embedding-8B-f16 来自 Qwen/Qwen3-Embedding-8B-GGUF 仓库的 F16 精度 8B 文本向量模型,支持 32K 上下文窗口与 32–4096 维可调嵌入,在多语种检索、分类、聚类、跨语匹配与代码检索等场景表现优异。
  • English: Qwen3-Embedding-8B-f16 is an F16, 8B-parameter text embedding model sourced from Qwen/Qwen3-Embedding-8B-GGUF. It supports a 32K context window and configurable 32–4096-dimensional embeddings for multilingual retrieval, classification, clustering, cross-lingual alignment, and code search.

模型特性 | Key Features

  • 多语种 / Multilingual:覆盖 100+ 语言;据 2025-06-05 的多语 MTEB 榜单为 70.58 分、排名靠前(以官方卡片为准)。
  • 可调维度 / Configurable dims:默认 4096 维,可按需裁剪至 32 维以在质量与成本间取舍。
  • 高保真 / High-fidelity:基于官方 GGUF F16 导出,便于与重排模型或各类向量库无缝集成。
  • 开源许可 / License:Apache-2.0;部署与再分发请保留许可证声明。

使用说明 | Usage

  1. 安装并启动 Ollama / Start Ollama daemon

  2. 拉取模型 / Pull

   ollama pull l284190056/Qwen3-Embedding-8B-f16
  1. 生成嵌入 / Generate embeddings
   ollama embed -m l284190056/Qwen3-Embedding-8B-f16 "测试一下 / Try a quick test"
  1. 集成提示 / Integration hints

    • 可与 Milvus、Qdrant、Weaviate 等向量数据库结合用于语义检索。
    • 与 reranker(重排)模型配合以提升排序质量。

版本信息 | Version Notes

  • 基于 Hugging Face 官方 GGUF F16 文件构建。
  • Modelfile 中包含模型描述与 Apache-2.0 许可证文本。

参考资源 | References


Qwen3-Embedding-8B-f16

F16 precision • 8B parameters • 32K context window • 32–4096 configurable embedding dimensions

Overview

Qwen3-Embedding-8B-f16 is an F16, 8B-parameter text-embedding model sourced from the Qwen/Qwen3-Embedding-8B-GGUF repository. It supports a 32K context window and configurable 32–4096-dimensional embeddings, delivering strong performance for multilingual retrieval, classification, clustering, cross-lingual alignment, and code search.

Key Features

  • Multilingual coverage across 100+ languages; strong results on the multilingual MTEB leaderboard (see the official model card for details).
  • Configurable dimensions: default 4096-d, adjustable down to 32-d to balance quality and cost.
  • High fidelity: built from the official GGUF F16 export for seamless pairing with rerankers and vector databases.
  • License: Apache-2.0. Retain the license notice for deployments and redistribution.

Usage

  1. Install and start Ollama (local or remote daemon).

  2. Pull the model

   ollama pull l284190056/Qwen3-Embedding-8B-f16
  1. Generate embeddings
   ollama embed -m l284190056/Qwen3-Embedding-8B-f16 "Try a quick test"
  1. Integration hints

    • Connect to vector databases such as Milvus, Qdrant, or Weaviate for semantic search.
    • Pair with a reranker to improve ranking quality on sensitive queries.

Version Notes

  • Built from the official Hugging Face GGUF F16 artifact.
  • The Modelfile includes descriptive metadata and the Apache-2.0 license text.

References