Models
GitHub
Discord
Docs
Pricing
Sign in
Download
Models
Download
GitHub
Discord
Docs
Pricing
Sign in
ahmadwaqar
/
smolvlm2-500m-video
218
Downloads
Updated
1 month ago
Compact 500M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.8GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.
Compact 500M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.8GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.
Cancel
vision
Name
3 models
Size
Context
Input
smolvlm2-500m-video:latest
918994c25a40
• 546MB • 8K context window •
Text, Image input • 1 month ago
Text, Image input • 1 month ago
smolvlm2-500m-video:latest
546MB
8K
Text, Image
918994c25a40
· 1 month ago
smolvlm2-500m-video:q8
latest
918994c25a40
• 546MB • 8K context window •
Text, Image input • 1 month ago
Text, Image input • 1 month ago
smolvlm2-500m-video:q8
latest
546MB
8K
Text, Image
918994c25a40
· 1 month ago
smolvlm2-500m-video:fp16
756bce9b8009
• 1.0GB • 8K context window •
Text, Image input • 1 month ago
Text, Image input • 1 month ago
smolvlm2-500m-video:fp16
1.0GB
8K
Text, Image
756bce9b8009
· 1 month ago