171 1 month ago

Lightweight 2.2B vision model for GUI automation - clicks, types, scrolls on screenshots. Fine-tuned for agentic reasoning with normalized [0,1] coordinate output. Available in Q4_K_M, Q8_0, and FP16 quantizations. Apache 2.0 license.

vision
9d619fa6e1b3 · 217B
<|im_start|>{{ if .System }}System: {{ .System }}<end_of_utterance>
{{ end }}<|im_start|>User: {{ if .Images }}<image> {{ end }}{{ .Prompt }}<end_of_utterance>
<|im_start|>Assistant: {{ .Response }}<end_of_utterance>