187 Downloads Updated 1 month ago
ollama run ahmadwaqar/mai-ui
Updated 1 month ago
1 month ago
cf0ee985da1d · 4.3GB ·
Foundation GUI agent by Alibaba Tongyi Lab built on Qwen3-VL for UI element detection and bounding box coordinate extraction.
[x1, y1, x2, y2] bounding boxes| Tag | Parameters | Size | RAM Required |
|---|---|---|---|
latest |
2B | ~4GB | ~6GB |
2b |
2B | ~4GB | ~6GB |
8b |
8B | ~16GB | ~18GB |
# Default (2B)
ollama run ahmadwaqar/mai-ui
# Explicit tags
ollama run ahmadwaqar/mai-ui:2b
ollama run ahmadwaqar/mai-ui:8b
curl http://localhost:11434/api/chat -d '{
"model": "ahmadwaqar/mai-ui",
"stream": false,
"format": "json",
"messages": [{
"role": "user",
"content": "Identify all clickable UI elements with bounding boxes",
"images": ["<BASE64_SCREENSHOT>"]
}]
}'
{
"bbox_2d": [789, 402, 869, 437],
"label": "forward chevron UI button"
}
Center: ((789+869)/2, (402+437)/2) = (829, 420)
import ollama
import base64
with open("screenshot.png", "rb") as f:
img = base64.b64encode(f.read()).decode()
response = ollama.chat(
model='ahmadwaqar/mai-ui',
format='json',
messages=[{
'role': 'user',
'content': 'Identify clickable elements with bounding boxes',
'images': [img]
}]
)
print(response['message']['content'])
| Property | Value |
|---|---|
| Architecture | Qwen3-VL |
| Family | MAI-UI (2B / 8B / 32B / 235B-A22B) |
| Min Image Size | 32x32 px |
| Output | JSON with bbox_2d coordinates |
| License | Apache 2.0 |
| Benchmark | Score |
|---|---|
| ScreenSpot-Pro | 73.5% |
| MMBench GUI L2 | 91.3% |
| OSWorld-G | 70.9% |
| UI-Vision | 49.2% |
| AndroidWorld | 76.7% |
| MobileWorld | 41.7% |