UI-TARS · Ollama

⇅

Cloud

Embedding

Vision

Tools

Thinking

ahmadwaqar/gui-owl

GUI-Owl is a multimodal vision-language model by mPLUG/Alibaba for GUI understanding and automation. State-of-the-art on ScreenSpot, OSWorld, AndroidWorld benchmarks. Detects UI elements and automates tasks on desktop and mobile devices.

vision

462 Pulls 2 Tags Updated 6 months ago
ahmadwaqar/mai-ui

Alibaba Tongyi GUI agent on Qwen3-VL. SOTA: 73.5% ScreenSpot-Pro, 76.7% AndroidWorld. Returns bbox [x1,y1,x2,y2] for UI automation. Supports MCP tools & device-cloud collaboration. Apache 2.0. Tags: 2b (default), 8b.

vision 2b 8b

423 Pulls 3 Tags Updated 5 months ago
studiobrn/uncensoredmodAI

Fully uncensored local AI for coding, automation, vision tasks, and direct final answers, built to reduce unnecessary thinking output and deliver complete responses.

vision tools thinking

1,025 Pulls 1 Tag Updated 1 month ago
asaad/epyac.1

OPEN SOURCE AI TO CONTROL YOUR SYSTEM AND HELP DEVALOPERS TO USE TERMINAL BETTER

vision tools

116 Pulls 1 Tag Updated 1 year ago