fredrezones55/ Jan-v2-VL:max-Q5_K_M

57 2 weeks ago

Jan-v2-VL is a family of 8B-parameter vision–language models for long-horizon, multi-step tasks in real software environments (e.g., browsers and desktop apps).

vision tools thinking
ollama run fredrezones55/Jan-v2-VL:max-Q5_K_M

Details

2 weeks ago

09b808a5c4a8 · 23GB ·

qwen3vlmoe
·
31.1B
·
Q5_K_M
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{ "presence_penalty": 1.5, "repeat_penalty": 1, "temperature": 1, "top_k": 20, "
{{ .Prompt }}

Readme

Model Source: https://huggingface.co/janhq/Jan-v2-VL-high

Jan-v2-VL: Multimodal Agent for Long-Horizon Tasks

This version of the model has been basically jammed in the “right” way that Ollama will approve the model. [It was a pain trying to get the vision function to properly see without it complaining about ‘blurry image’ or ‘hallucinating’] [Currently just like qwen3-vl, thinking can break; this would need to be worked on…; update 03-05-2026: thinking has been hard-coded to be always thinking, which resolves the issue where the qwen3-vl engine breaks]. to create this model port of the Jan-v2-VL family, an attempt was made to painstakingly reverse engineer Ollama’s native support for Qwen3-vl and port it to Janhq’s Jan-v2-VL models was made to get this thing to work right, as a single text+mmproj gguf blob model file. (as an experiment I did this without looking at the model_vision.go file and only looked at the final model [I definitely did not forget or anything…])

These models can reason and run multiple tools per user request (if the tools or the model is prompted to do so well)

  • update [March 3rd 2026], to correct the issue where Jan-v2-VL is unable to think as it prematurely tries to trigger a tool instead of thinking [which breaks it], a new template was used [source] (used their template) and hard coded it to always think.

  • warning: tooling and thinking tends to break, as seen with it’s base model [qwen3-vl 8B] in the Ollama engine, some tweaks might be needed. [-max seems to be except from this.]

  • the other Jan-v2-VL models will be uploaded soon.

  • this model family has been converted in such a way where it effectively mimics the architecture of Ollama’s implementation of qwen3-vl + some fixes [to ensure it works]

  • Did you know?;

    • Ollama mix quantifies Qwen3-vl’s vision projection weights to optimize VRAM.
    • Or that, numpy and ggml writer can for some fun reason flip the architecture shapes. 🤣
  • Notes:

    • /no_think flags does not work except with -old-think; this is to correct for a bug that effects qwen3-vl thinking models under Ollama.

the different levels of reasoning efforts

  1. -low (Default)
  2. -med
  3. -high
  4. -max (a different Qwen3-vl 30B MOE based model; that can rival Deepseek-R1 and Gemini-2.5-Pro in long single operations [a lot of tool calling up to 134]) This model while the largest, seems the most reliable of the family running in Ollama.

these levels are different levels of reasoning that Jan team ingrained in each model, they are the same size but trained slightly differently.

Overview

Jan-v2-VL is an 8B-parameter vision–language model for long-horizon, multi-step tasks in real software environments (e.g., browsers and desktop apps). It combines language reasoning with visual perception to follow complex instructions, maintain intermediate state, and recover from minor execution errors.

We recognize the importance of long-horizon execution for real-world tasks, where small per-step gains compound into much longer successful chains—so Jan-v2-VL is built for stable, many-step execution. For evaluation, we use The Illusion of Diminishing Returns: Measuring Long-Horizon Execution in LLMs, which measures execution length. This benchmark aligns with public consensus on what makes a strong coding model—steady, low-drift step execution—suggesting that robust long-horizon ability closely tracks better user experience.

Variants

  • Jan-v2-VL-low — efficiency-oriented, lower latency
  • Jan-v2-VL-med — balanced latency/quality
  • Jan-v2-VL-high — deeper reasoning; higher think time

Intended Use

Tasks where the plan and/or knowledge can be provided up front, and success hinges on stable, many-step execution with minimal drift:

  • Agentic automation & UI control: Stepwise operation in browsers/desktop apps with screenshot grounding and tool calls (e.g., BrowserMCP).

Model Performance

image

Compared with its base (Qwen-3-VL-8B-Thinking), Jan-v2-VL shows no degradation on standard text-only and vision tasks—and is slightly better on several—while delivering stronger long-horizon execution on the Illusion of Diminishing Returns benchmark.

image

image

image

Local Deployment

Integration with Jan App

Jan-v2-VL is optimized for direct integration with the Jan App. Simply select the model from the Jan App interface for immediate access to its full capabilities.

Local Deployment

Using vLLM:

vllm serve Menlo/Jan-v2-VL-high \
    --host 0.0.0.0 \
    --port 1234 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --reasoning-parser qwen3 
    

Using llama.cpp:

llama-server --model Jan-v2-VL-high-Q8_0.gguf \
    --vision-model-path mmproj-Jan-v2-VL-high.gguf \
    --host 0.0.0.0 \
    --port 1234 \
    --jinja \
    --no-context-shift

Recommended Parameters

For optimal performance in agentic and general tasks, we recommend the following inference parameters:

temperature: 1.0
top_p: 0.95
top_k: 20
repetition_penalty: 1.0
presence_penalty: 1.5

🤝 Community & Support

📄 Citation

Updated Soon