150 8 hours ago

A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone

vision
ollama run openbmb/minicpm-v4.6:q5_K_S

Details

8 hours ago

e08cca967d6f · 1.7GB ·

qwen35
·
752M
·
Q5_K_S
clip
·
548M
·
F16
{{- if .Messages }}{{- range $i, $_ := .Messages }}{{- $last := eq (len (slice $.Messages $i)) 1 -}}
You are a helpful assistant.
{ "num_ctx": 4096, "stop": [ "[\"<|im_start|>\",\"<|im_end|>\"]" ], "tempera

Readme

MiniCPM-V.png

MiniCPM-V 4.6

MiniCPM-V 4.6 is our most edge-deployment-friendly model to date. The model is built based on SigLIP2-400M and the Qwen3.5-0.8B LLM. It inherents the strong single-image, multi-image, and video understanding capabilities of MiniCPM-V family, while significantly improving computation efficiency. It also introduces mixed 4x/16x visual token compression. Notable features of MiniCPM-V 4.6 include:

  • 🔥 Leading Foundation Capability. MiniCPM-V 4.6 scores 13 on the Artificial Analysis Intelligence Index benchmark, outperforming Qwen3.5-0.8B’s score of 10 with 19x fewer token cost, and Qwen3.5-0.8B-Thinking’s score of 11 with 43x fewer token cost. It also surpasses the larger Ministral 3 3B (score of 11).

  • 💪 Strong Multimodal Capability. MiniCPM-V 4.6 outperforms Qwen3.5-0.8B on most vision-language understanding tasks, and reaches Qwen3.5 2B-level capability on many benchmarks including OpenCompass, RefCOCO, HallusionBench, MUIRBench, and OCRBench.

  • 🚀 Ultra-Efficient Architecture. Based on the latest technique in LLaVA-UHD v4, MiniCPM-V 4.6 reduces the visual encoding computation FLOPs by more than 50%. It enables MiniCPM-V 4.6 to achieve better efficiency to even smaller models, achieving x2.4 token throughput compared to Qwen3.5-0.8B. It also supports mixed 4x/16x visual token compression rate, allowing flexible switching between accuracy and speed.

  • 📱 Broad Mobile Platform Coverage. MiniCPM-V 4.6 can be deployed across all three mainstream mobile platforms — iOS, Android, and HarmonyOS. With every edge adaptation code open-sourced, developers can reproduce the on-device experience in just a few steps.

  • 🛠️ Developer Friendly. MiniCPM-V 4.6 is adapted to inference frameworks such as vLLM, SGLang, llama.cpp, Ollama, and supports fine-tuning ecosystems such as SWIFT and LLaMA-Factory. Developers can quickly customize models for new domains and tasks on consumer-grade GPUs. We provide multiple quantized variants across GGUF, BNB, AWQ, and GPTQ formats.

Note: If you want to use local deployment, you can refer to this document.