73 Downloads Updated 1 week ago
Updated 1 week ago
1 week ago
18a7243e13d4 ยท 1.0GB ยท
SmolVLM2-2.2B-Instruct is a lightweight yet powerful vision-language model that can understand images, read documents, and analyze video frames. At just 2.2B parameters, it runs efficiently on consumer hardware including laptops and smartphones, making advanced vision AI accessible to everyone.
SmolVLM2-2.2B-Instruct is a highly efficient 2.2 billion parameter vision-language model from HuggingFace, designed for image understanding, video analysis, and multimodal reasoning. Despite its compact size, it delivers impressive performance on vision tasks while running on consumer hardware.
| Tag | Size | RAM Required | Description |
|---|---|---|---|
q4_k_m |
1.0 GB | ~4GB | Recommended - best quality/size ratio |
q8_0 |
1.8 GB | ~6GB | Higher quality, minimal loss |
f16 |
3.4 GB | ~8GB | Full precision, maximum quality |
# Recommended version (Q4_K_M)
ollama run richardyoung/smolvlm2-2.2b-instruct "Describe this image"
# Higher quality version
ollama run richardyoung/smolvlm2-2.2b-instruct:q8_0 "What text is in this document?"
# Full precision
ollama run richardyoung/smolvlm2-2.2b-instruct:f16 "Analyze this chart"
ollama run richardyoung/smolvlm2-2.2b-instruct "Describe what you see in detail"
ollama run richardyoung/smolvlm2-2.2b-instruct "Extract all text from this document"
ollama run richardyoung/smolvlm2-2.2b-instruct "How many people are in this photo?"
ollama run richardyoung/smolvlm2-2.2b-instruct "What is happening in these video frames?"
Apache 2.0 - Free for commercial and personal use.
Note: For vision tasks, use with an Ollama client that supports image input (e.g., Open WebUI, Ollama API with base64 images).