45 3 days ago

Kimi-VL-A3B-Thinking is a powerful vision-language model from Moonshot AI featuring extended thinking capabilities. Built on the DeepSeek2 architecture with Mixture of Experts (MoE), it excels at complex visual reasoning tasks, mathematical problem-s

3 days ago

8d359f1d802a Β· 14GB Β·

deepseek2
Β·
16B
Β·
Q6_K
{{ if .System }}<|im_system|>system<|im_middle|>{{ .System }}<|im_end|>{{ else }}<|im_system|>system
{ "num_ctx": 131072, "stop": [ "<|im_end|>", "<|endoftext|>" ] }
You are Kimi, a helpful AI assistant with vision and reasoning capabilities from Moonshot AI. You ca
# Kimi-VL-A3B-Thinking Q6_K: Advanced Vision-Language Model with Extended Reasoning ## Overview Kimi

Readme

Kimi-VL-A3B-Thinking: Advanced Vision-Language Model with Extended Reasoning

Kimi-VL-A3B-Thinking is a powerful vision-language model from Moonshot AI featuring extended thinking capabilities for complex visual reasoning. Built on the DeepSeek2 architecture with Mixture of Experts (MoE), it excels at solving math problems from images, analyzing documents, and performing step-by-step visual reasoning with chain-of-thought explanations.

πŸš€ Overview

Kimi-VL-A3B-Thinking is a powerful vision-language model from Moonshot AI featuring extended thinking capabilities. Built on the DeepSeek2 architecture with Mixture of Experts (MoE), it excels at complex visual reasoning tasks, mathematical problem-solving from images, and detailed image analysis with chain-of-thought explanations.

🎯 Key Features

  • Extended Thinking - Chain-of-thought reasoning for complex visual problems
  • MoE Architecture - 64 experts + 2 shared experts for efficient inference
  • 128K Context - Massive 131,072 token context window
  • MLA Attention - Multi-head Latent Attention for improved performance
  • MIT License - Fully open source

πŸ“Š Capabilities

  • Visual Math: Solve mathematical problems from handwritten or printed equations
  • Document Analysis: Extract and reason about document content
  • Chart Understanding: Interpret graphs, charts, and data visualizations
  • Scene Reasoning: Complex multi-step reasoning about image content
  • OCR + Reasoning: Read text and apply logical reasoning

🏷️ Available Versions

Tag Size RAM Required Description
q4_k_m 9.8 GB ~16GB Recommended - best quality/size ratio
f16 30 GB ~40GB Full precision, maximum quality

πŸ’» Quick Start

# Recommended version (Q4_K_M)
ollama run richardyoung/kimi-vl-a3b-thinking "Solve this math problem step by step"

# Full precision version
ollama run richardyoung/kimi-vl-a3b-thinking:f16 "Analyze this diagram in detail"

πŸ› οΈ Example Use Cases

Math Problem Solving

ollama run richardyoung/kimi-vl-a3b-thinking "Solve this equation and show your work"

Document Analysis

ollama run richardyoung/kimi-vl-a3b-thinking "Extract key information from this document"

Visual Reasoning

ollama run richardyoung/kimi-vl-a3b-thinking "What can you infer about this scene?"

Chart Interpretation

ollama run richardyoung/kimi-vl-a3b-thinking "Analyze the trends shown in this chart"

πŸ“‹ System Requirements

Minimum Requirements

  • RAM: 16GB
  • GPU: 8GB+ VRAM recommended
  • Storage: 12GB free space

Recommended Setup

  • RAM: 32GB+ or Apple Silicon with 24GB+ unified memory
  • GPU: 16GB+ VRAM for best performance
  • Storage: 35GB free space (for all versions)

🌟 What Makes This Model Special

  1. Thinking Mode: Extended reasoning chains for complex problems
  2. MoE Efficiency: 64 experts activated selectively for better performance
  3. Huge Context: 128K tokens handles large documents and conversations
  4. Math Excellence: Superior performance on visual math benchmarks
  5. Production Quality: Extensively tested by Moonshot AI team

πŸ”— Links

🀝 Credits

  • Original Model: Moonshot AI
  • GGUF Conversion: Richard Young (deepneuro.ai)
  • Quantization: llama.cpp (PR #15458 branch for Kimi-VL support)

πŸ“ License

MIT License - Free for commercial and personal use.


Note: For vision tasks, use with an Ollama client that supports image input (e.g., Open WebUI, Ollama API with base64 images). The model performs best when asked to β€œthink step by step”.