223 1 week ago

Holo-3.1 vision-language computer-use agents by H Company. Locate UI elements and drive web, desktop & mobile automation from a screenshot — returns clicks in normalized [0,1000] coords. 0.8B & 4B, instruct & thinking variants, Q4_K_M/Q8_0. Apache 2.0.

vision tools 0.8b 4b
ollama run ahmadwaqar/holo-3.1

Applications

Claude Code
Claude Code ollama launch claude --model ahmadwaqar/holo-3.1
Codex App
Codex App ollama launch codex-app --model ahmadwaqar/holo-3.1
OpenClaw
OpenClaw ollama launch openclaw --model ahmadwaqar/holo-3.1
Hermes Agent
Hermes Agent ollama launch hermes --model ahmadwaqar/holo-3.1
Codex
Codex ollama launch codex --model ahmadwaqar/holo-3.1
OpenCode
OpenCode ollama launch opencode --model ahmadwaqar/holo-3.1

Models

View all →

Readme

Holo-3.1 — Fast & Local Computer-Use Agents

Vision-Language Models (VLMs) for computer-use agents: UI grounding, web/desktop automation, mobile automation, and business workflows. Built by H Company on the Qwen 3.5 family, packaged here as GGUF for Ollama with the CLIP vision projector bundled in.

Given a screenshot + an instruction, Holo locates the correct UI element and returns an action (e.g. a click at normalized [0, 1000] coordinates) or a textual answer.

ollama run ahmadwaqar/holo-3.1

Available tags

All variants live under one repo and are selected by tag.

Tag Size Variant Quant Notes
latest, 4b 4B instruct Q8_0 Default. Best general accuracy / quality.
4b-q4 4B instruct Q4_K_M Smaller, faster, slightly lower quality.
4b-thinking 4B thinking Q8_0 Emits a <think> reasoning plan before acting.
4b-thinking-q4 4B thinking Q4_K_M Thinking, smaller footprint.
0.8b 0.8B instruct Q8_0 Ultra-light; fast one-shot grounding on modest hardware.
0.8b-q4 0.8B instruct Q4_K_M Smallest build.
ollama pull ahmadwaqar/holo-3.1:4b
ollama pull ahmadwaqar/holo-3.1:0.8b
ollama pull ahmadwaqar/holo-3.1:4b-thinking

Which one should I use?

  • Single-shot UI grounding / element localization4b (or 0.8b on lighter machines). These answer directly into content; temperature defaults to 0.0 for deterministic coordinates.
  • Multi-step agent loops / planning before acting4b-thinking. It produces a <think> plan first; defaults to temp 0.6 / top_p 0.95 / top_k 20.
  • Tight memory / CPU → the -q4 tags (Q4_K_M).

Usage

Holo is multimodal: send the screenshot and the instruction in the same user turn. Coordinates returned are integers in a normalized [0, 1000] space (origin top-left); scale them to real pixels with px = x / 1000 * image_width, py = y / 1000 * image_height.

Recommended system prompt

You are Holo, a GUI grounding agent for computer-use automation. Given a
screenshot and a task, locate the correct UI element and call the appropriate
tool. Click coordinates must be integers in the [0, 1000] space, normalized to
the provided image with the origin at the top-left corner.

API example (chat with an image)

curl http://localhost:11434/api/chat -d '{
  "model": "ahmadwaqar/holo-3.1:4b",
  "stream": false,
  "messages": [
    { "role": "system", "content": "You are Holo, a GUI grounding agent..." },
    {
      "role": "user",
      "content": "Click the search box.",
      "images": ["<base64-encoded screenshot>"]
    }
  ]
}'

Python (ollama package)

import ollama

resp = ollama.chat(
    model="ahmadwaqar/holo-3.1:4b",
    messages=[
        {"role": "system", "content": "You are Holo, a GUI grounding agent..."},
        {"role": "user", "content": "Click the search box.", "images": ["screenshot.png"]},
    ],
)
print(resp["message"]["content"])

Defaults

instruct tags thinking tags
temperature 0.0 0.6
top_p 1.0 0.95
top_k 20
num_ctx 8192 16384

Override per request as needed (e.g. raise num_ctx for long agent histories).

Notes

  • Tool calling: Holo was trained with a custom XML function-call format (<tool_call><function=name><parameter=...>). The bundled chat template feeds tools and tool history correctly, but Ollama’s parser will not reliably surface that XML as structured tool_calls in the API response — parse the XML from the assistant content on the client side. Plain chat and vision grounding work normally.
  • Name casing: Ollama lowercases all repo names, so the pull name is holo-3.1 even though the model is styled “Holo-3.1”.

License & attribution

Apache 2.0. Original models and research by H Company — Holo3.1 family (0.8B / 4B / 9B / 35B-A3B), based on Qwen 3.5. These tags are GGUF conversions (instruct + thinking variants) of Holo3.1-0.8B and Holo3.1-4B.

@misc{hai2026holo31,
  title={Holo3.1: Fast & Local Computer Use Agents},
  author={H Company},
  year={2026},
  url={https://huggingface.co/Hcompany/Holo3.1-35B-A3B}
}