ahmadwaqar/ smolvlm2-2.2b-instruct

195 Downloads Updated 2 months ago

SmolVLM2-2.2B-Instruct is a compact multimodal model for image and video understanding. Built on SmolLM2-1.7B with SigLIP vision encoder. Supports visual QA, OCR, and video analysis. Available in Q8 and FP16 quantizations. Apache 2.0 license.

vision

ollama run ahmadwaqar/smolvlm2-2.2b-instruct

curl http://localhost:11434/api/chat \
  -d '{
    "model": "ahmadwaqar/smolvlm2-2.2b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='ahmadwaqar/smolvlm2-2.2b-instruct',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'ahmadwaqar/smolvlm2-2.2b-instruct',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

Name

2 models

Size

Context

Input

smolvlm2-2.2b-instruct:latest

2.5GB · 8K context window · Text, Image · 2 months ago

smolvlm2-2.2b-instruct:latest

2.5GB

8K

Text, Image

smolvlm2-2.2b-instruct:fp16

4.5GB · 8K context window · Text, Image · 2 months ago

smolvlm2-2.2b-instruct:fp16

4.5GB

8K

Text, Image

Readme

SmolVLM2-2.2B-Instruct

A compact yet powerful vision-language model from Hugging Face.

Features

Image & Video Understanding: Describe images, answer visual questions, analyze documents
2.2B Parameters: Efficient enough for edge deployment
Multiple Quantizations: Q8_0 and FP16 variants available
Apache 2.0: Fully open source

Available Variants

Tag	Quantization	Size	Notes
`latest`	Q8_0	~2.4GB	Default
`q8`	Q8_0	~2.4GB	Same as latest
`fp16`	F16	~4.4GB	Full precision

Usage

# Default (Q8)
ollama run ahmadwaqar/smolvlm2-2.2b-instruct "Describe this image" --images photo.jpg

# Explicit Q8
ollama run ahmadwaqar/smolvlm2-2.2b-instruct:q8 "Describe this image" --images photo.jpg

# FP16 (higher quality)
ollama run ahmadwaqar/smolvlm2-2.2b-instruct:fp16 "Describe this image" --images photo.jpg

API

from ollama import Client

client = Client(host='http://localhost:11434')
response = client.chat(
    model='ahmadwaqar/smolvlm2-2.2b-instruct',  # uses Q8 by default
    messages=[{
        'role': 'user',
        'content': 'What do you see?',
        'images': ['image.png']
    }]
)
print(response['message']['content'])

Model Details

Property	Value
Parameters	2.2B
Architecture	SmolLM2-1.7B + SigLIP
Context	8K tokens
Variants	Q8_0 (default), FP16
License	Apache 2.0

Links

# SmolVLM2-2.2B-Instruct

A compact yet powerful vision-language model from Hugging Face.

## Features

- **Image & Video Understanding**: Describe images, answer visual questions, analyze documents
- **2.2B Parameters**: Efficient enough for edge deployment
- **Multiple Quantizations**: Q8_0 and FP16 variants available
- **Apache 2.0**: Fully open source

## Available Variants

| Tag | Quantization | Size | Notes |
|-----|--------------|------|-------|
| `latest` | Q8_0 | ~2.4GB | Default |
| `q8` | Q8_0 | ~2.4GB | Same as latest |
| `fp16` | F16 | ~4.4GB | Full precision |

## Usage
```bash
# Default (Q8)
ollama run ahmadwaqar/smolvlm2-2.2b-instruct "Describe this image" --images photo.jpg

# Explicit Q8
ollama run ahmadwaqar/smolvlm2-2.2b-instruct:q8 "Describe this image" --images photo.jpg

# FP16 (higher quality)
ollama run ahmadwaqar/smolvlm2-2.2b-instruct:fp16 "Describe this image" --images photo.jpg
```

### API
```python
from ollama import Client

client = Client(host='http://localhost:11434')
response = client.chat(
    model='ahmadwaqar/smolvlm2-2.2b-instruct',  # uses Q8 by default
    messages=[{
        'role': 'user',
        'content': 'What do you see?',
        'images': ['image.png']
    }]
)
print(response['message']['content'])
```

## Model Details

| Property | Value |
|----------|-------|
| Parameters | 2.2B |
| Architecture | SmolLM2-1.7B + SigLIP |
| Context | 8K tokens |
| Variants | Q8_0 (default), FP16 |
| License | Apache 2.0 |

## Links

- [Original Model](https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct)
- [GGUF Source](https://huggingface.co/ggml-org/SmolVLM2-2.2B-Instruct-GGUF)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)