llmvision/glimpse-v1:q4_k

Details

Updated 1 month ago

1 month ago

ae173ab7b0ff · 3.3GB ·

model

archgemma3

parameters3.88B

quantizationQ4_K_M

2.5GB

projector

archclip

parameters420M

quantizationBF16

851MB

template

{{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if or (eq .Rol

360B

params

{ "num_ctx": 4096, "stop": [ "<start_of_turn>user", "<eos>" ], "temp

118B

Glimpse-v1 is a lightweight vision-language model (VLM) built to summarize home security camera events. It natively supports structured JSON, enabling seamless integration into automations.

Get Started

Install Ollama
Set up LLM Vision for Home Assistant
Pull this model using the command at the top of this page.
Set up the Ollama provider in LLM Vision
Choose llmvision/glimpse-v1 as the default model

Quantization

Glimpse-v1 is available in different quantization options to accommodate different hardware profiles.

latest (Q8_0): We strongly recommend this variant if your hardware is capable.
q4_k_m: Reduced memory footprint with medium quality loss.

Usage

Glimpse-v1 is specifically trained to summarize footage from smart doorbells and other home security cameras.

Note: This is not a chat model. It takes one image and should be used with the prompt provided below (“Original Training Instructions”).
For best results and ease of use, use the official blueprint for LLM Vision.

Known limitations

We are currently aware of the following limitations:

Model hallucinates when using Memory from LLM Vision. Solution: disable memory.
Vehicles not recognized: Cars may sometimes be detected as stationary by the model, which will result in “no activity”.
Languages: Glimpse currently only supports English.

Usage without LLM Vision

While we recommend running Glimpse-v1 together with LLM Vision, you can run this model in custom setups. Below are the recommended parameters and the original instructions the model was trained on.

Recommended Parameters for inference

Parameter	Value
Temperature	0.3
Top P	0.95
Top K	64

Original Training Instructions

Task: Analyze the provided security camera image and generate a smart-home event notification.

Output:
Return a single valid JSON object with exactly two string fields:
- "title": a short summary (2-5 words)
- "description": a brief factual description of what is happening

Title Rules:
The "title" must:
- Be 2-5 words
- Be short and glanceable
- Avoid long phrases or full sentences
The title should summarize the event category and location.
All additional detail belongs in "description".

Delivery Inference Rules:
If a person is:
- Holding or placing a package or letters
- and wearing a delivery uniform
- or a delivery vehicle is visible
Then:
- the title must contain the word "delivery":
  - Use a delivery-style title (2-5 words) (examples: "Package delivery", "Delivery at porch", "Courier delivery")
  - Include the carrier name in the description if the carrier branding is visually identifiable (e.g. "Amazon delivery", "FedEx delivery")

Empty scene handling:
- If no clear activity or relevant objects (such as people, vehicles, or animals) are present, set:
  - "title" to exactly: "No activity"
  - "description" to a brief statement describing that nothing notable is seen

Description Rules:
- 1-2 short sentences
- Do not include explanations or reasoning
- Do not repeat the task or rules
- Use present tense
- Neutral and factual
- Describe what is happening

Do not mention camera angle, lighting quality, or image clarity.