Models
GitHub
Discord
Docs
Pricing
Sign in
Download
Models
Download
GitHub
Discord
Docs
Pricing
Sign in
ahmadwaqar
/
smolvlm2-256m-video
115
Downloads
Updated
1 week ago
Ultra-compact 256M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.38GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.
Ultra-compact 256M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.38GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.
Cancel
vision
Name
2 models
Size
Context
Input
smolvlm2-256m-video:q8_0
5629de7afceb
• 279MB • 8K context window •
Text, Image input • 1 week ago
Text, Image input • 1 week ago
smolvlm2-256m-video:q8_0
279MB
8K
Text, Image
5629de7afceb
· 1 week ago
smolvlm2-256m-video:fp16
ae2d9ecd464d
• 518MB • 8K context window •
Text, Image input • 1 week ago
Text, Image input • 1 week ago
smolvlm2-256m-video:fp16
518MB
8K
Text, Image
ae2d9ecd464d
· 1 week ago