63 2 weeks ago

SmolVLM2-2.2B-Instruct is a compact multimodal model for image and video understanding. Built on SmolLM2-1.7B with SigLIP vision encoder. Supports visual QA, OCR, and video analysis. Available in Q8 and FP16 quantizations. Apache 2.0 license.

vision
f63383114d06 · 147B
You are a helpful AI assistant that can understand and describe images and videos. You provide accurate and concise descriptions of visual content.