107 3 days ago

Private visual Intelligence for your home.

vision
ollama run llmvision/glimpse-v1

Details

3 days ago

3225100ecc36 · 5.0GB ·

gemma3
·
3.88B
·
Q8_0
clip
·
420M
·
BF16
{{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if or (eq .Rol
{ "num_ctx": 4096, "stop": [ "<start_of_turn>user", "<eos>" ], "temp

Readme

LLM Vision Logo

Glimpse-v1 is a lightweight vision-language model (VLM) built to summarize home security camera events. It natively supports structured JSON, enabling seamless integration into automations.

Get Started

  1. Install Ollama
  2. Set up LLM Vision for Home Assistant
  3. Pull this model using the command at the top of this page.
  4. Set up the Ollama provider in LLM Vision
  5. Choose llmvision/glimpse-v1 as the default model

Usage

Glimpse-v1 is specifically trained to summarize footage from smart doorbells and other home security cameras. For best results and ease of use, use the official blueprint for LLM Vision.

Usage without LLM Vision

While we recommend running Glimpse-v1 together with LLM Vision, you can run this model in custom setups. Below are the recommended parameters and the original instructions the model was trained on.

Recommended Parameters for inference

Parameter Value
Temperature 0.3
Top P 0.95
Top K 64

Original Training Instructions

Task: Analyze the provided security camera image and generate a smart-home event notification.

Output:
Return a single valid JSON object with exactly two string fields:
- "title": a short summary (2-5 words)
- "description": a brief factual description of what is happening

Title Rules:
The "title" must:
- Be 2-5 words
- Be short and glanceable
- Avoid long phrases or full sentences
The title should summarize the event category and location.
All additional detail belongs in "description".

Delivery Inference Rules:
If a person is:
- Holding or placing a package or letters
- and wearing a delivery uniform
- or a delivery vehicle is visible
Then:
- the title must contain the word "delivery":
  - Use a delivery-style title (2-5 words) (examples: "Package delivery", "Delivery at porch", "Courier delivery")
  - Include the carrier name in the description if the carrier branding is visually identifiable (e.g. "Amazon delivery", "FedEx delivery")

Empty scene handling:
- If no clear activity or relevant objects (such as people, vehicles, or animals) are present, set:
  - "title" to exactly: "No activity"
  - "description" to a brief statement describing that nothing notable is seen

Description Rules:
- 1-2 short sentences
- Do not include explanations or reasoning
- Do not repeat the task or rules
- Use present tense
- Neutral and factual
- Describe what is happening

Do not mention camera angle, lighting quality, or image clarity.

Benchmark

The following benchmark compares semantic similarity (CLIP) of a validation response to the model’s response.

Mean Score Overall

mean_score_overview.pngHigher is better

Mean Score by Category mean_score_categories.pngHigher is better

Latency latency.pngLower is better

Latency vs Score latency_vs_score.pngLower is better