107 Downloads Updated 3 days ago
ollama run llmvision/glimpse-v1
Updated 3 days ago
3 days ago
3225100ecc36 · 5.0GB ·
Glimpse-v1 is a lightweight vision-language model (VLM) built to summarize home security camera events. It natively supports structured JSON, enabling seamless integration into automations.
llmvision/glimpse-v1 as the default modelGlimpse-v1 is specifically trained to summarize footage from smart doorbells and other home security cameras. For best results and ease of use, use the official blueprint for LLM Vision.
While we recommend running Glimpse-v1 together with LLM Vision, you can run this model in custom setups. Below are the recommended parameters and the original instructions the model was trained on.
Recommended Parameters for inference
| Parameter | Value |
|---|---|
| Temperature | 0.3 |
| Top P | 0.95 |
| Top K | 64 |
Original Training Instructions
Task: Analyze the provided security camera image and generate a smart-home event notification.
Output:
Return a single valid JSON object with exactly two string fields:
- "title": a short summary (2-5 words)
- "description": a brief factual description of what is happening
Title Rules:
The "title" must:
- Be 2-5 words
- Be short and glanceable
- Avoid long phrases or full sentences
The title should summarize the event category and location.
All additional detail belongs in "description".
Delivery Inference Rules:
If a person is:
- Holding or placing a package or letters
- and wearing a delivery uniform
- or a delivery vehicle is visible
Then:
- the title must contain the word "delivery":
- Use a delivery-style title (2-5 words) (examples: "Package delivery", "Delivery at porch", "Courier delivery")
- Include the carrier name in the description if the carrier branding is visually identifiable (e.g. "Amazon delivery", "FedEx delivery")
Empty scene handling:
- If no clear activity or relevant objects (such as people, vehicles, or animals) are present, set:
- "title" to exactly: "No activity"
- "description" to a brief statement describing that nothing notable is seen
Description Rules:
- 1-2 short sentences
- Do not include explanations or reasoning
- Do not repeat the task or rules
- Use present tense
- Neutral and factual
- Describe what is happening
Do not mention camera angle, lighting quality, or image clarity.
The following benchmark compares semantic similarity (CLIP) of a validation response to the model’s response.
Mean Score Overall
Higher is better
Mean Score by Category
Higher is better
Latency
Lower is better
Latency vs Score
Lower is better