171 Downloads Updated 1 month ago
ollama run ahmadwaqar/smolvlm2-agentic-gui
A lightweight vision-language model fine-tuned for GUI automation and agentic tasks. This model can understand screenshots, locate UI elements, and execute multi-step interactions on desktop and mobile interfaces.
| Tag | Quantization | Size | Notes |
|---|---|---|---|
latest |
Q4_K_M | ~1.1GB | Default, best speed/quality tradeoff |
q8_0 |
Q8_0 | ~1.9GB | Higher precision |
fp16 |
F16 | ~3.6GB | Full precision |
| Property | Value |
|---|---|
| Base Model | smolagents/SmolVLM2-2.2B-Instruct-Agentic-GUI |
| Parameters | 2.2B |
| Variants | Q4_K_M (default), Q8_0, FP16 |
| Context Length | 4096 |
| Vision Support | Yes (mmproj-f16 projector) |
| License | Apache 2.0 |
# Default (Q4_K_M)
ollama run ahmadwaqar/smolvlm2-agentic-gui "Click on the search button" --images ./screenshot.png
# Q8_0 (higher precision)
ollama run ahmadwaqar/smolvlm2-agentic-gui:q8_0 "Click on the search button" --images ./screenshot.png
# FP16 (full precision)
ollama run ahmadwaqar/smolvlm2-agentic-gui:fp16 "Click on the search button" --images ./screenshot.png
import ollama
response = ollama.chat(
model='ahmadwaqar/smolvlm2-agentic-gui', # or :q8_0 or :fp16
messages=[{
'role': 'user',
'content': 'Click on the search button',
'images': ['./screenshot.png']
}]
)
print(response['message']['content'])
This model was trained using a two-phase approach from Smol2Operator:
| Benchmark | Score |
|---|---|
| ScreenSpot-v2 | 61.71% |
The model outputs actions in normalized [0,1] coordinates:
click(x, y) - Click at normalized coordinatesdouble_click(x, y) - Double-click at normalized coordinateslong_press(x, y) - Long press at normalized coordinatestype(text) - Type text inputpress(keys) - Press keyboard key(s) (e.g. “enter”, [“ctrl”, “c”])scroll(direction, amount) - Scroll up or downdrag(from_coord, to_coord) - Drag from [x1, y1] to [x2, y2]navigate_back() - Go back to previous pagewait(seconds) - Wait for specified duration"Click on the 'Submit' button"
"Type 'hello world' in the search field"
"Scroll down to see more content"
"Navigate to the settings menu"
@misc{smol2operator2025,
title={Smol2Operator: Post-Training GUI Agents for Computer Use},
author={Hugging Face Team},
year={2025},
url={https://huggingface.co/blog/smol2operator}
}
Apache 2.0