Kimi-VL-A3B-Thinking is a powerful vision-language model from Moonshot AI featuring extended thinking capabilities. Built on the DeepSeek2 architecture with Mixture of Experts (MoE), it excels at complex visual reasoning tasks, mathematical problem-s

Details

Updated 2 months ago

2 months ago

8d359f1d802a · 14GB ·

model

archdeepseek2

parameters16B

quantizationQ6_K

14GB

template

{{ if .System }}<|im_system|>system<|im_middle|>{{ .System }}<|im_end|>{{ else }}<|im_system|>system

273B

params

{ "num_ctx": 131072, "stop": [ "<|im_end|>", "<|endoftext|>" ] }

77B

system

You are Kimi, a helpful AI assistant with vision and reasoning capabilities from Moonshot AI. You ca

169B

license

# Kimi-VL-A3B-Thinking Q6_K: Advanced Vision-Language Model with Extended Reasoning ## Overview Kimi

812B

Kimi-VL-A3B-Thinking: Advanced Vision-Language Model with Extended Reasoning

Kimi-VL-A3B-Thinking is a powerful vision-language model from Moonshot AI featuring extended thinking capabilities for complex visual reasoning. Built on the DeepSeek2 architecture with Mixture of Experts (MoE), it excels at solving math problems from images, analyzing documents, and performing step-by-step visual reasoning with chain-of-thought explanations.

🚀 Overview

🎯 Key Features

Extended Thinking - Chain-of-thought reasoning for complex visual problems
MoE Architecture - 64 experts + 2 shared experts for efficient inference
128K Context - Massive 131,072 token context window
MLA Attention - Multi-head Latent Attention for improved performance
MIT License - Fully open source

📊 Capabilities

Visual Math: Solve mathematical problems from handwritten or printed equations
Document Analysis: Extract and reason about document content
Chart Understanding: Interpret graphs, charts, and data visualizations
Scene Reasoning: Complex multi-step reasoning about image content
OCR + Reasoning: Read text and apply logical reasoning

🏷️ Available Versions

Tag	Size	RAM Required	Description
`q4_k_m`	9.8 GB	~16GB	Recommended - best quality/size ratio
`f16`	30 GB	~40GB	Full precision, maximum quality

💻 Quick Start

# Recommended version (Q4_K_M)
ollama run richardyoung/kimi-vl-a3b-thinking "Solve this math problem step by step"

# Full precision version
ollama run richardyoung/kimi-vl-a3b-thinking:f16 "Analyze this diagram in detail"

🛠️ Example Use Cases

Math Problem Solving

ollama run richardyoung/kimi-vl-a3b-thinking "Solve this equation and show your work"

Document Analysis

ollama run richardyoung/kimi-vl-a3b-thinking "Extract key information from this document"

Visual Reasoning

ollama run richardyoung/kimi-vl-a3b-thinking "What can you infer about this scene?"

Chart Interpretation

ollama run richardyoung/kimi-vl-a3b-thinking "Analyze the trends shown in this chart"

📋 System Requirements

Minimum Requirements

RAM: 16GB
GPU: 8GB+ VRAM recommended
Storage: 12GB free space

Recommended Setup

RAM: 32GB+ or Apple Silicon with 24GB+ unified memory
GPU: 16GB+ VRAM for best performance
Storage: 35GB free space (for all versions)

🌟 What Makes This Model Special

Thinking Mode: Extended reasoning chains for complex problems
MoE Efficiency: 64 experts activated selectively for better performance
Huge Context: 128K tokens handles large documents and conversations
Math Excellence: Superior performance on visual math benchmarks
Production Quality: Extensively tested by Moonshot AI team

🔗 Links

Original Model: moonshotai/Kimi-VL-A3B-Thinking-2506
GGUF Files: richardyoung/Kimi-VL-A3B-Thinking-GGUF
Ollama: richardyoung/kimi-vl-a3b-thinking

🤝 Credits

Original Model: Moonshot AI
GGUF Conversion: Richard Young (deepneuro.ai)
Quantization: llama.cpp (PR #15458 branch for Kimi-VL support)

📝 License

MIT License - Free for commercial and personal use.

Note: For vision tasks, use with an Ollama client that supports image input (e.g., Open WebUI, Ollama API with base64 images). The model performs best when asked to “think step by step”.