30 1 week ago

Lightweight 2.2B vision model for GUI automation - clicks, types, scrolls on screenshots. Fine-tuned on aguvis datasets for agentic reasoning. Available in Q8 and FP16 quantizations. Apache 2.0 license.

vision
1392a35f8364 · 342B
<|im_start|>{{- if .System }}System: {{ .System }}<end_of_utterance>
{{ end }}{{- range .Messages }}{{- if eq .Role "user" }}User:{{- if .Images }}{{ range .Images }}<image>{{ end }}{{ else }} {{ end }}{{ .Content }}<end_of_utterance>
{{ else if eq .Role "assistant" }}Assistant: {{ .Content }}<end_of_utterance>
{{ end }}{{- end }}Assistant: