Models
GitHub
Discord
Docs
Pricing
Sign in
Download
Models
Download
GitHub
Discord
Docs
Pricing
Sign in
ahmadwaqar
/
smolvlm2-256m-video
:fp16
111
Downloads
Updated
1 week ago
Ultra-compact 256M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.38GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.
Ultra-compact 256M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.38GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.
Cancel
vision
smolvlm2-256m-video:fp16
...
/
template
836e095f63ff · 160B
<|im_start|>{{ if .System }}System: {{ .System }}<end_of_utterance>
{{ end }}User: {{ .Prompt }}<end_of_utterance>
Assistant: {{ .Response }}<end_of_utterance>