ahmadwaqar/ gui-owl

157 Downloads Updated 2 months ago

GUI-Owl is a multimodal vision-language model by mPLUG/Alibaba for GUI understanding and automation. State-of-the-art on ScreenSpot, OSWorld, AndroidWorld benchmarks. Detects UI elements and automates tasks on desktop and mobile devices.

vision

ollama run ahmadwaqar/gui-owl:7b-q8

curl http://localhost:11434/api/chat \
  -d '{
    "model": "ahmadwaqar/gui-owl:7b-q8",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='ahmadwaqar/gui-owl:7b-q8',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'ahmadwaqar/gui-owl:7b-q8',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

Name

2 models

Size

Context

Input

gui-owl:7b-q8

8.8GB · 32K context window · Text, Image · 2 months ago

gui-owl:7b-q8

8.8GB

32K

Text, Image

gui-owl:32b-q8

36GB · 32K context window · Text, Image · 2 months ago

gui-owl:32b-q8

36GB

32K

Text, Image

Readme

GUI-Owl

GUI-Owl is a multimodal vision-language model developed by mPLUG (Alibaba) as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance on GUI automation benchmarks including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld.

Usage

ollama run ahmadwaqar/guiowl:7b-q8

ollama run ahmadwaqar/guiowl:32b-q8

Available Tags

Tag	Parameters	Quantization	Size
`7b-q8`	7B	Q8_0	~8 GB
`32b-q8`	32B	Q8_0	~34 GB

Capabilities

GUI element detection and grounding
Screen navigation and task automation
Desktop and mobile UI understanding
Visual question answering for UI components
End-to-end decision making for GUI tasks

Example

ollama run ahmadwaqar/guiowl:7b-q8
>>> [attach screenshot] What button should I click to submit this form?

Model Details

Attribute	Value
Developer	mPLUG / Alibaba
Base Model	Qwen2.5-VL-7B-Instruct / Qwen2.5-VL-32B
GGUF Quant By	mradermacher
License	Apache 2.0 (7B) / Qwen License (32B)
Paper	arXiv:2508.15144
GitHub	X-PLUG/MobileAgent

Citation

@misc{ye2025mobileagentv3,
  title={Mobile-Agent-v3: Foundamental Agents for GUI Automation},
  author={Jiabo Ye and Xi Zhang and Haiyang Xu and others},
  year={2025},
  eprint={2508.15144},
  archivePrefix={arXiv},
  primaryClass={cs.AI}
}

Credits

Original model by mPLUG/Alibaba
GGUF quantization by mradermacher
Ollama packaging by ahmadwaqar

# GUI-Owl

GUI-Owl is a multimodal vision-language model developed by [mPLUG (Alibaba)](https://huggingface.co/mPLUG) as part of the [Mobile-Agent-V3](https://github.com/X-PLUG/MobileAgent) project. It achieves state-of-the-art performance on GUI automation benchmarks including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld.

## Usage

```bash
ollama run ahmadwaqar/guiowl:7b-q8

ollama run ahmadwaqar/guiowl:32b-q8
```

## Available Tags

|Tag     |Parameters|Quantization|Size  |
|:-------|:---------|:-----------|:-----|
|`7b-q8` |7B        |Q8_0        |~8 GB |
|`32b-q8`|32B       |Q8_0        |~34 GB|

## Capabilities

- GUI element detection and grounding
- Screen navigation and task automation
- Desktop and mobile UI understanding
- Visual question answering for UI components
- End-to-end decision making for GUI tasks

## Example

```bash
ollama run ahmadwaqar/guiowl:7b-q8
>>> [attach screenshot] What button should I click to submit this form?
```

## Model Details

|Attribute    |Value                                                      |
|:------------|:----------------------------------------------------------|
|Developer    |[mPLUG / Alibaba](https://huggingface.co/mPLUG)            |
|Base Model   |Qwen2.5-VL-7B-Instruct / Qwen2.5-VL-32B                    |
|GGUF Quant By|[mradermacher](https://huggingface.co/mradermacher)        |
|License      |Apache 2.0 (7B) / Qwen License (32B)                       |
|Paper        |[arXiv:2508.15144](https://arxiv.org/abs/2508.15144)       |
|GitHub       |[X-PLUG/MobileAgent](https://github.com/X-PLUG/MobileAgent)|

## Citation

```bibtex
@misc{ye2025mobileagentv3,
  title={Mobile-Agent-v3: Foundamental Agents for GUI Automation},
  author={Jiabo Ye and Xi Zhang and Haiyang Xu and others},
  year={2025},
  eprint={2508.15144},
  archivePrefix={arXiv},
  primaryClass={cs.AI}
}
```

## Credits

- Original model by [mPLUG/Alibaba](https://huggingface.co/mPLUG)
- GGUF quantization by [mradermacher](https://huggingface.co/mradermacher)
- Ollama packaging by [ahmadwaqar](https://github.com/ahmadwaqar)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)