MiniCPM-V surpasses proprietary models such as GPT-4V, Gemini Pro, Qwen-VL and Claude 3 in overall performance, and support multimodal conversation for over 30 languages.

Note: You need to rebuild ./ollama binary file first, there are 3 ways to do so.

1. Download the binary file

Go to release page and download the file.

🔥 Especially the ./ollama-linux-arm64 file was build on debian os. It can run in Termux app on android phone.

Start the server:

./ollama-linux-x86_64 serve

Running this model:

ollama run hhao/openbmb-minicpm-llama3-v-2_5

2. Running in docker (use cpu or gpu)

🆕 Support x86_64 and arm64 arch os.
Support CUDA (NVIDIA) and ROCm (AMD). more detail >>

# x86_64 arch
docker pull hihao/ollama-amd64

# arm64 arch
# docker pull hihao/ollama-arm64

docker run -d -v ./models:/root/.ollama -p 11434:11434 --name ollama hihao/ollama-amd64

docker exec -it ollama bash

ollama run hhao/openbmb-minicpm-llama3-v-2_5

3. Rebuild ./ollama binary file instruction

Install Requirements

cmake version 3.24 or higher
go version 1.22 or higher
gcc version 11.4.0 or higher

Setup the Code

Prepare both our llama.cpp fork and this Ollama fork.

git clone -b minicpm-v2.5 https://github.com/OpenBMB/ollama.git
cd ollama/llm
git clone -b minicpm-v2.5 https://github.com/OpenBMB/llama.cpp.git
cd ../

MacOS Build

Here we give a MacOS example. See the developer guide for more platforms.

brew install go cmake gcc

Optionally enable debugging and more verbose logging:

## At build time
export CGO_CFLAGS="-g"

## At runtime
export OLLAMA_DEBUG=1

Get the required libraries and build the native LLM code:

go generate ./...

Build ollama:

go build .

Start the server:

./ollama serve

Running this model:

ollama run hhao/openbmb-minicpm-llama3-v-2_5

Windows Build

Note: The windows build for Ollama is still under development.

Install required tools:

MSVC toolchain - C/C++ and cmake as minimal requirements
Go version 1.22 or higher
MinGW (pick one variant) with GCC.
- MinGW-w64
- MSYS2

$env:CGO_ENABLED="1"
go generate ./...
go build .

Start the server:

./ollama serve

Running this model:

ollama run hhao/openbmb-minicpm-llama3-v-2_5

Windows CUDA (NVIDIA) Build

In addition to the common Windows development tools described above, install CUDA after installing MSVC.

NVIDIA CUDA

Windows ROCm (AMD Radeon) Build

In addition to the common Windows development tools described above, install AMDs HIP package after installing MSVC.

Lastly, add ninja.exe included with MSVC to the system path (e.g. C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja).

Linux Build

See the developer guide for Linux.

MiniCPM-V: A GPT-4V Level Multimodal LLM on Your Phone

MiniCPM-Llama3-V 2.5: 🔥🔥🔥 The latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, the model surpasses proprietary models such as GPT-4V-1106, Gemini Pro, Qwen-VL-Max and Claude 3 in overall performance. Equipped with the enhanced OCR and instruction-following capability, the model can also support multimodal conversation for over 30 languages including English, Chinese, French, Spanish, German etc. With help of quantization, compilation optimizations, and several efficient inference techniques on CPUs and NPUs, MiniCPM-Llama3-V 2.5 can be efficiently deployed on end-side devices.

News

📌 Pinned

[2024.05.28] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 now fully supports its feature in llama.cpp and ollama! Please pull the latest code for llama.cpp & ollama. We also release GGUF in various sizes here. FAQ list for ollama usage is comming within a day. Please stay tuned!
[2024.05.28] 💫 We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics here.
[2024.05.23] 🔍 We’ve released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency 🌟📊🌍🚀. Click here to view more details.
[2024.05.23] 🔥🔥🔥 MiniCPM-V tops GitHub Trending and Hugging Face Trending! Our demo, recommended by Hugging Face Gradio’s official account, is available here. Come and try it out!

[2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it here!
[2024.05.24] We release the MiniCPM-Llama3-V 2.5 gguf, which supports llama.cpp inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!
[2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide efficient inference and simple fine-tuning. Try it now!
[2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click here to view more details.
[2024.04.18] We create a HuggingFace Space to host the demo of MiniCPM-V 2.0 at here!
[2024.04.17] MiniCPM-V-2.0 supports deploying WebUI Demo now!
[2024.04.15] MiniCPM-V-2.0 now also supports fine-tuning with the SWIFT framework!
[2024.04.12] We open-source MiniCPM-V 2.0, which achieves comparable performance with Gemini Pro in understanding scene text and outperforms strong Qwen-VL-Chat 9.6B and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. Click here to view the MiniCPM-V 2.0 technical blog.
[2024.03.14] MiniCPM-V now supports fine-tuning with the SWIFT framework. Thanks to Jintao for the contribution！
[2024.03.01] MiniCPM-V now can be deployed on Mac!

MiniCPM-V surpasses proprietary models such as GPT-4V, Gemini Pro, Qwen-VL and Claude 3 in overall performance, and support multimodal conversation for over 30 languages.

Models

Readme

Note: You need to rebuild ./ollama binary file first, there are 3 ways to do so.

1. Download the binary file

2. Running in docker (use cpu or gpu)

3. Rebuild ./ollama binary file instruction

Install Requirements

Setup the Code

MacOS Build

Windows Build

Windows CUDA (NVIDIA) Build

Windows ROCm (AMD Radeon) Build

Linux Build

MiniCPM-V: A GPT-4V Level Multimodal LLM on Your Phone

News

📌 Pinned