268 5 months ago

Build Ollama For Reranker

5 months ago

f83c4587aedb · 2.9GB ·

qwen3
·
4.02B
·
Q5_K_M
<|im_start|>system Judge whether the Document meets the requirements based on the Query and the Inst

Readme

此提交仅供测试,功能暂未完善,仅提供给开发者以完善PR,原作者已不再更新,目前存在bug,仅供参考

官方暂未支持,该项目为热心开发者提交的PR自编译的,使用前请仔细审阅源码并自编译。

参考build指南:https://github.com/ollama/ollama/pull/11389#issuecomment-3089786702

参考千问官方模型说明:https://qwenlm.github.io/zh/blog/qwen3-embedding/

写在最前,如何build(直接运行在CPU上):

git clone https://github.com/sinjab/ollama.git
cd ollama
git checkout reranking-implementation
go build .
OLLAMA_NEW_ENGINE=1 ./ollama serve

或者,使用GPU加速(我的是cuda12.8)

git clone https://github.com/sinjab/ollama.git
cd ollama
git checkout reranking-implementation
cmake -B build
cmake --build build
go build .
export OLLAMA_HOST=0.0.0.0:11436
export OLLAMA_MODELS=/usr/share/ollama/.ollama/models/
OLLAMA_NEW_ENGINE=1 ./ollama serve

(如果需要迁移到别的内网环境,lib文件在../lib/ollama (Linux),请按照结构,构造一个文件夹,里面存放./ollama和./lib/ollama/*.so,lib的文件在输出目录../lib/ollama (Linux)中,以及ldd ./build/lib/ollama/libggml-cuda.so的链接中 ) image.png 最终长这个样子才能正确运行cuda。 image.png

# Ollama Rerank 大模型安装指南
### 1. 下载并解压二进制文件# 下载二进制文件(我的系统是wsl2 amd64 Ubuntu 24.04),如需cuda支持请参考官方ollama编译教程,找到编译后的lib文件,放到指定相对路径下。
##我已经编译成功的文件如下,自行编译需要把cuda的四个依赖文件放进来。https://github.com/AuditAIH/ollama-rerank/releases/download/0.1/app_with_lib_WSL2_ubuntu2404_amd64_cuda12.8.tar.gz

wget https://github.com/AuditAIH/ollama-rerank/releases/download/0.1/ollala_app_rerank.tar.gz

# 解压文件
tar -xzvf ollala_app_rerank.tar.gz

# 创建并进入解压目录
cd ollama-rerank
### 2. 下载模型文件
您可以从Hugging Face或Ollama官方仓库获取模型:

#### 方法一:从Hugging Face下载wget https://huggingface.co/mradermacher/Qwen3-Reranker-0.6B-GGUF/resolve/main/Qwen3-Reranker-0.6B.f16.gguf,导入方法请复制我的模型模板,参考https://ollama.com/AuditAid/Reranker_v2以支持bge-reranker-v2-m3-GGUF。
#下载其他模型:ollama pull hf.co/mradermacher/Qwen3-Reranker-8B-GGUF:F16 
#修改template文件,参考我的4B的模板。
'''
<|im_start|>system
Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>
<|im_start|>user
<Instruct>: Please judge relevance.
<Query>: {{ .Query }}
<Document>: {{ .Document }}<|im_end|>
<|im_start|>assistant
<think>

</think>
'''
参考地址:https://huggingface.co/mradermacher/Qwen3-Reranker-8B-GGUF/tree/main?local-app=ollama
#### 方法二:从Ollama官方仓库拉取./ollama pull AuditAid/Qwen3_Reranker:0.6B_Q2_K
### 3. 启动服务
建议在独立端口上运行Ollama服务:# 设置服务地址和端口

export OLLAMA_HOST=0.0.0.0:11436

# 启用新引擎并后台运行服务
OLLAMA_NEW_ENGINE=1 ./ollama serve &

#快速拉取模型
./ollama pull AuditAid/Qwen3_Reranker:0.6B_Q2_K

## 测试服务
使用以下命令测试服务是否正常工作:curl -X POST http://localhost:11436/api/rerank \
  -H "Content-Type: application/json" \
  -d '{
  "model": "AuditAid/Qwen3-Reranker-0.6B.Q2_K.gguf",
  "query": "What is machine learning?",
  "documents": [
    "Machine learning is a subset of artificial intelligence",
    "The weather today is sunny and warm"
  ]
}'
## 常见问题
1. 如果遇到权限问题,请使用`chmod +x ollama`赋予执行权限
2. 若端口被占用,请修改`OLLAMA_HOST`中的端口号
3. 模型下载缓慢时,可以尝试使用代理或选择合适的量化版本

# Ollama Rerank Large Model Installation Guide

## Installation Steps
### 1. Download and Extract Binary Files# Download the binary file
wget https://github.com/AuditAIH/ollama-rerank/releases/download/0.1/ollala_app_rerank.tar.gz

# Extract the file
tar -xzvf ollala_app_rerank.tar.gz

# Navigate to the extracted directory
cd ollama-rerank
### 2. Download Model Files
You can obtain the model from Hugging Face or the official Ollama repository:

#### Method 1: Download from Hugging Facewget https://huggingface.co/mradermacher/Qwen3-Reranker-0.6B-GGUF/resolve/main/Qwen3-Reranker-0.6B.f16.gguf
#### Method 2: Pull from the Ollama Repository./ollama pull AuditAid/Qwen3-Reranker-0.6B.Q2_K.gguf
### 3. Start the Service
It is recommended to run the Ollama service on a separate port:# Set the service address and port
export OLLAMA_HOST=0.0.0.0:11436

# Enable the new engine and run the service in the background
OLLAMA_NEW_ENGINE=1 ./ollama serve &

## Test the Service
Use the following command to test if the service is working properly:curl -X POST http://localhost:11436/api/rerank \
  -H "Content-Type: application/json" \
  -d '{
  "model": "AuditAid/Qwen3-Reranker-0.6B.Q2_K.gguf",
  "query": "What is machine learning?",
  "documents": [
    "Machine learning is a subset of artificial intelligence",
    "The weather today is sunny and warm"
  ]
}'
## FAQs
1. If you encounter permission issues, use `chmod +x ollama` to grant execution permissions
2. If the port is occupied, modify the port number in `OLLAMA_HOST`
3. If the model download is slow, try using a proxy or select a suitable quantization version

 if lucky,print:
{
"model": "qwen_reranker",
"results": [
{
"index": 1,
"document": "Machine learning is a subset of artificial intelligence",
"relevance_score": 0.9784514
},
{
"index": 3,
"document": "Deep learning uses neural networks for pattern recognition",
"relevance_score": 0.6934207
},
{
"index": 2,
"document": "Pizza is made with tomatoes and cheese",
"relevance_score": 0.46293145
},
{
"index": 4,
"document": "The weather today is sunny and warm",
"relevance_score": 0.32852712
},
{
"index": 0,
"document": "Angela Merkel was the Chancellor of Germany",
"relevance_score": 0.30637652
}
]
}