Tags · ahmadwaqar/smolvlm2-256m-video

ahmadwaqar/ smolvlm2-256m-video

1,162 Downloads Updated 6 months ago

Ultra-compact 256M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.38GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.

vision

Name

2 models

Size / Usage

Context

Input

smolvlm2-256m-video:q8_0

279MB

Text, Image

5629de7afceb · 6 months ago

smolvlm2-256m-video:fp16

518MB

Text, Image

ae2d9ecd464d · 6 months ago