llmvision/glimpse-v1

Details

Updated 3 days ago

3 days ago

3225100ecc36 · 5.0GB ·

model

archgemma3

parameters3.88B

quantizationQ8_0

4.1GB

projector

archclip

parameters420M

quantizationBF16

851MB

template

{{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if or (eq .Rol

360B

params

{ "num_ctx": 4096, "stop": [ "<start_of_turn>user", "<eos>" ], "temp

118B

Glimpse-v1 is a lightweight vision-language model (VLM) built to summarize home security camera events. It natively supports structured JSON, enabling seamless integration into automations.

Get Started

Install Ollama
Set up LLM Vision for Home Assistant
Pull this model using the command at the top of this page.
Set up the Ollama provider in LLM Vision
Choose llmvision/glimpse-v1 as the default model

Usage

Glimpse-v1 is specifically trained to summarize footage from smart doorbells and other home security cameras. For best results and ease of use, use the official blueprint for LLM Vision.

Usage without LLM Vision

While we recommend running Glimpse-v1 together with LLM Vision, you can run this model in custom setups. Below are the recommended parameters and the original instructions the model was trained on.

Recommended Parameters for inference

Parameter	Value
Temperature	0.3
Top P	0.95
Top K	64

Original Training Instructions

Task: Analyze the provided security camera image and generate a smart-home event notification.

Output:
Return a single valid JSON object with exactly two string fields:
- "title": a short summary (2-5 words)
- "description": a brief factual description of what is happening

Title Rules:
The "title" must:
- Be 2-5 words
- Be short and glanceable
- Avoid long phrases or full sentences
The title should summarize the event category and location.
All additional detail belongs in "description".

Delivery Inference Rules:
If a person is:
- Holding or placing a package or letters
- and wearing a delivery uniform
- or a delivery vehicle is visible
Then:
- the title must contain the word "delivery":
  - Use a delivery-style title (2-5 words) (examples: "Package delivery", "Delivery at porch", "Courier delivery")
  - Include the carrier name in the description if the carrier branding is visually identifiable (e.g. "Amazon delivery", "FedEx delivery")

Empty scene handling:
- If no clear activity or relevant objects (such as people, vehicles, or animals) are present, set:
  - "title" to exactly: "No activity"
  - "description" to a brief statement describing that nothing notable is seen

Description Rules:
- 1-2 short sentences
- Do not include explanations or reasoning
- Do not repeat the task or rules
- Use present tense
- Neutral and factual
- Describe what is happening

Do not mention camera angle, lighting quality, or image clarity.

Benchmark

The following benchmark compares semantic similarity (CLIP) of a validation response to the model’s response.

Mean Score Overall

Higher is better

Mean Score by Category Higher is better

Latency Lower is better

Latency vs Score Lower is better

Private visual Intelligence for your home.

Details

Readme

Get Started

Usage

Usage without LLM Vision

Benchmark