48 Downloads Updated 2 weeks ago
GUI-Owl is a multimodal vision-language model developed by mPLUG (Alibaba) as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance on GUI automation benchmarks including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld.
ollama run ahmadwaqar/guiowl:7b-q8
ollama run ahmadwaqar/guiowl:32b-q8
| Tag | Parameters | Quantization | Size |
|---|---|---|---|
7b-q8 |
7B | Q8_0 | ~8 GB |
32b-q8 |
32B | Q8_0 | ~34 GB |
ollama run ahmadwaqar/guiowl:7b-q8
>>> [attach screenshot] What button should I click to submit this form?
| Attribute | Value |
|---|---|
| Developer | mPLUG / Alibaba |
| Base Model | Qwen2.5-VL-7B-Instruct / Qwen2.5-VL-32B |
| GGUF Quant By | mradermacher |
| License | Apache 2.0 (7B) / Qwen License (32B) |
| Paper | arXiv:2508.15144 |
| GitHub | X-PLUG/MobileAgent |
@misc{ye2025mobileagentv3,
title={Mobile-Agent-v3: Foundamental Agents for GUI Automation},
author={Jiabo Ye and Xi Zhang and Haiyang Xu and others},
year={2025},
eprint={2508.15144},
archivePrefix={arXiv},
primaryClass={cs.AI}
}