ahmadwaqar/smolvlm2-500m-video:fp16/params

ahmadwaqar/ smolvlm2-500m-video:fp16

606 Downloads Updated 4 months ago

Compact 500M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.8GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.

vision

smolvlm2-500m-video:fp16 ... /

params

53ed932be8fa · 57B

{

"num_ctx": 8192,

"stop": [

"<end_of_utterance>"

]

}