ahmadwaqar/smolvlm2-2.2b-instruct:fp16/template

ahmadwaqar/ smolvlm2-2.2b-instruct:fp16

195 Downloads Updated 2 months ago

SmolVLM2-2.2B-Instruct is a compact multimodal model for image and video understanding. Built on SmolLM2-1.7B with SigLIP vision encoder. Supports visual QA, OCR, and video analysis. Available in Q8 and FP16 quantizations. Apache 2.0 license.

vision

template

443acda817d5 · 344B

{{- if .System }}<|im_start|>system

{{ .System }}<|im_end|>

{{ end }}{{- range .Messages }}{{- if eq .Role "user" }}<|im_start|>user

{{ range .Images }}<image>{{ end }}

{{ .Content }}<|im_end|>

{{ else if eq .Role "assistant" }}<|im_start|>assistant

{{ .Content }}<|im_end|>

{{ end }}{{- end }}<|im_start|>assistant