133 Downloads Updated 1 week ago
GitHub Repo | Technical Report | Join Us
π Contact us in Discord and WeChat
[2025.09.05] MiniCPM4.1 series are released! This series is a hybrid reasoning model, which can be used in both deep reasoning mode and non-reasoning mode. π₯π₯π₯
[2025.06.06] MiniCPM4 series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report here.π₯π₯π₯
MiniCPM4 and MiniCPM4.1 series are highly efficient large language models (LLMs) designed explicitly for end-side devices, which achieves this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems.
MiniCPM4.1-8B: The latest version of MiniCPM4, with 8B parameters, support fusion thinking.
MiniCPM4.1-8B-GPTQ: MiniCPM4.1-8B in GPTQ format.
MiniCPM4.1-8B-AutoAWQ: MiniCPM4.1-8B in AutoAWQ format.
MiniCPM-4.1-8B-Marlin: MiniCPM4.1-8B in Marlin format.
MiniCPM4.1-8B-GGUF: MiniCPM4.1-8B in GGUF format. (<β you are here)
MiniCPM4.1-8B-MLX: MiniCPM4.1-8B in MLX format.
MiniCPM4.1-8B-Eagle3: Eagle3 model for MiniCPM4.1-8B.
MiniCPM4 Series
Click to expand all MiniCPM4 series models
MiniCPM4-8B: The flagship model with 8B parameters, trained on 8T tokens
MiniCPM4-0.5B: Lightweight version with 0.5B parameters, trained on 1T tokens
MiniCPM4-8B-Eagle-FRSpec: Eagle head for FRSpec, accelerating speculative inference
MiniCPM4-8B-Eagle-FRSpec-QAT-cpmcu: Eagle head with QAT for FRSpec, integrating speculation and quantization for ultra acceleration
MiniCPM4-8B-Eagle-vLLM: Eagle head in vLLM format for speculative inference
MiniCPM4-8B-marlin-Eagle-vLLM: Quantized Eagle head for vLLM format
BitCPM4-0.5B: Extreme ternary quantization of MiniCPM4-0.5B, achieving 90% bit width reduction
BitCPM4-1B: Extreme ternary quantization of MiniCPM3-1B, achieving 90% bit width reduction
MiniCPM4-Survey: Generates trustworthy, long-form survey papers from user queries
MiniCPM4-MCP: Integrates MCP tools to autonomously satisfy user requirements
MiniCPM4 and MiniCPM4.1 are extremely efficient edge-side large model that has undergone efficient optimization across four dimensions: model architecture, learning algorithms, training data, and inference systems, achieving ultimate efficiency improvements.
ποΈ Efficient Model Architecture:
π§ Efficient Learning Algorithms:
π High-Quality Training Data:
β‘ Efficient Inference System: