115 Downloads Updated 1 week ago
ollama run ahmadwaqar/smolvlm2-256m-video:q8_0
Updated 1 week ago
1 week ago
5629de7afceb · 279MB ·
Ultra-compact 256M parameter vision-language model optimized for video and image understanding. Requires only 1.38GB VRAM for inference. The smallest video language model ever released.
latest / q8 — Q8_0 quantization, ~175MB (default)fp16 — F16 full precision, ~328MB# Default (Q8)
ollama run ahmadwaqar/smolvlm2-256m-video "Describe this image" ./photo.jpg
# FP16 (higher quality)
ollama run ahmadwaqar/smolvlm2-256m-video:fp16 "Describe this image" ./photo.jpg
import ollama
response = ollama.chat(
model='ahmadwaqar/smolvlm2-256m-video', # or :fp16
messages=[{
'role': 'user',
'content': 'Describe this image',
'images': ['./image.jpg']
}]
)
print(response['message']['content'])
IMG=$(base64 < image.jpg | tr -d '\n')
curl http://localhost:11434/api/chat -d '{
"model": "ahmadwaqar/smolvlm2-256m-video",
"messages": [{
"role": "user",
"content": "What is in this image?",
"images": ["'"$IMG"'"]
}]
}'
Apache 2.0 “`
| Spec | 256M Model | 500M Model |
|---|---|---|
| Parameters | 256M | 500M |
| VRAM | ~1.38GB | ~1.8GB |
| Q8 Size | ~175MB | ~546MB |
| FP16 Size | ~328MB | ~1GB |
| Context | 2048 tokens | 4096 tokens |
| Video-MME | 33.7 | 42.2 |
| MLVU | 40.6 | 47.3 |
| MVBench | 32.7 | 39.73 |