171 1 month ago

Lightweight 2.2B vision model for GUI automation - clicks, types, scrolls on screenshots. Fine-tuned for agentic reasoning with normalized [0,1] coordinate output. Available in Q4_K_M, Q8_0, and FP16 quantizations. Apache 2.0 license.

vision
ce88c8976f3a · 114B
{
"num_ctx": 4096,
"num_predict": 512,
"stop": [
"<end_of_utterance>",
"<|im_end|>"
],
"temperature": 0
}