53 Downloads Updated 2 weeks ago
Compact 500M parameter vision-language model optimized for video and image understanding. Requires only 1.8GB VRAM for inference.
latest / q8 — Q8_0 quantization, ~546MB (default)fp16 — F16 full precision, ~1GB# Default (Q8)
ollama run ahmadwaqar/smolvlm2-500m-video "Describe this image" ./photo.jpg
# FP16 (higher quality)
ollama run ahmadwaqar/smolvlm2-500m-video:fp16 "Describe this image" ./photo.jpg
import ollama
response = ollama.chat(
model='ahmadwaqar/smolvlm2-500m-video', # or :fp16
messages=[{
'role': 'user',
'content': 'Describe this image',
'images': ['./image.jpg']
}]
)
print(response['message']['content'])
IMG=$(base64 < image.jpg | tr -d '\n')
curl http://localhost:11434/api/chat -d '{
"model": "ahmadwaqar/smolvlm2-500m-video",
"messages": [{
"role": "user",
"content": "What is in this image?",
"images": ["'"$IMG"'"]
}]
}'
Apache 2.0