Models
GitHub
Discord
Docs
Cloud
Sign in
Download
Models
Download
GitHub
Discord
Docs
Cloud
Sign in
ahmadwaqar
/
smolvlm2-500m-video
:latest
55
Downloads
Updated
2 weeks ago
Compact 500M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.8GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.
Compact 500M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.8GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.
Cancel
vision
smolvlm2-500m-video:latest
...
/
params
f20822ffb07e · 79B
{
"num_ctx": 4096,
"stop": [
"<|im_end|>"
],
"temperature": 0.7,
"top_p": 0.9
}