171 1 month ago

Lightweight 2.2B vision model for GUI automation - clicks, types, scrolls on screenshots. Fine-tuned for agentic reasoning with normalized [0,1] coordinate output. Available in Q4_K_M, Q8_0, and FP16 quantizations. Apache 2.0 license.

vision