408 3 weeks ago

A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

vision 4b

1 month ago

f46e33779a62 · 3.1GB

llama
·
3.61B
·
Q4_K_S
clip
·
466M
·
F16
{{- if .Messages }}{{- range $i, $_ := .Messages }}{{- $last := eq (len (slice $.Messages $i)) 1 -}}
You are a helpful assistant.
{ "num_ctx": 4096, "stop": [ "[\"<|im_start|>\",\"<|im_end|>\"]" ], "tempera

Readme

MiniCPM-V.png

A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

MiniCPM-V 4.0 🤗 🤖 | MiniCPM-o 2.6 🤗 🤖 | MiniCPM-V 2.6 🤗 🤖 | 🍳 Cookbook | 📄 Technical Blog [English/中文]

MiniCPM-V 4.0 is the latest efficient model in the MiniCPM-V series. The model is built based on SigLIP2-400M and MiniCPM4-3B with a total of 4.1B parameters. It inherits the strong single-image, multi-image and video understanding performance of MiniCPM-V 2.6 with largely improved efficiency. Notable features of MiniCPM-V 4.0 include:

  • 🔥 Leading Visual Capability. With only 4.1B parameters, MiniCPM-V 4.0 achieves an average score of 69.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks, outperforming GPT-4.1-mini-20250414, MiniCPM-V 2.6 (8.1B params, OpenCompass 65.2) and Qwen2.5-VL-3B-Instruct (3.8B params, OpenCompass 64.5). It also shows good performance in multi-image understanding and video understanding.

  • 🚀 Superior Efficiency. Designed for on-device deployment, MiniCPM-V 4.0 runs smoothly on end devices. For example, it devlivers less than 2s first token delay and more than 17 token/s decoding on iPhone 16 Pro Max, without heating problems. It also shows superior throughput under concurrent requests.

  • 💫 Easy Usage. MiniCPM-V 4.0 can be easily used in various ways including llama.cpp, Ollama, vLLM, SGLang, LLaMA-Factory and local web demo etc. We also open-source iOS App that can run on iPhone and iPad. Get started easily with our well-structured Cookbook, featuring detailed instructions and practical examples.