57 Downloads Updated 2 weeks ago
ollama run fredrezones55/Jan-v2-VL
ollama launch claude --model fredrezones55/Jan-v2-VL
ollama launch codex --model fredrezones55/Jan-v2-VL
ollama launch opencode --model fredrezones55/Jan-v2-VL
ollama launch openclaw --model fredrezones55/Jan-v2-VL
Name
10 models
Jan-v2-VL:latest
6.1GB · 256K context window · Text, Image · 3 weeks ago
Jan-v2-VL:high
6.1GB · 256K context window · Text, Image · 3 weeks ago
Jan-v2-VL:low
latest6.1GB · 256K context window · Text, Image · 2 weeks ago
Jan-v2-VL:med
6.1GB · 256K context window · Text, Image · 2 weeks ago
Jan-v2-VL:high-old-think
6.1GB · 256K context window · Text, Image · 3 weeks ago
Jan-v2-VL:high-Q5_K_M
7.0GB · 256K context window · Text, Image · 2 weeks ago
Jan-v2-VL:low-q5
7.0GB · 256K context window · Text, Image · 2 weeks ago
Jan-v2-VL:max-Q8
34GB · 256K context window · Text, Image · 2 weeks ago
Jan-v2-VL:max-Q5_K_M
23GB · 256K context window · Text, Image · 2 weeks ago
Jan-v2-VL:max-Q6_K
26GB · 256K context window · Text, Image · 2 weeks ago
Model Source: https://huggingface.co/janhq/Jan-v2-VL-high
This version of the model has been basically jammed in the “right” way that Ollama will approve the model. [It was a pain trying to get the vision function to properly see without it complaining about ‘blurry image’ or ‘hallucinating’] [Currently just like qwen3-vl, thinking can break; this would need to be worked on…; update 03-05-2026: thinking has been hard-coded to be always thinking, which resolves the issue where the qwen3-vl engine breaks]. to create this model port of the Jan-v2-VL family, an attempt was made to painstakingly reverse engineer Ollama’s native support for Qwen3-vl and port it to Janhq’s Jan-v2-VL models was made to get this thing to work right, as a single text+mmproj gguf blob model file. (as an experiment I did this without looking at the model_vision.go file and only looked at the final model [I definitely did not forget or anything…])
These models can reason and run multiple tools per user request (if the tools or the model is prompted to do so well)
update [March 3rd 2026], to correct the issue where Jan-v2-VL is unable to think as it prematurely tries to trigger a tool instead of thinking [which breaks it], a new template was used [source] (used their template) and hard coded it to always think.
warning: tooling and thinking tends to break, as seen with it’s base model [qwen3-vl 8B] in the Ollama engine, some tweaks might be needed. [-max seems to be except from this.]
the other Jan-v2-VL models will be uploaded soon.
this model family has been converted in such a way where it effectively mimics the architecture of Ollama’s implementation of qwen3-vl + some fixes [to ensure it works]
Did you know?;
Notes:
these levels are different levels of reasoning that Jan team ingrained in each model, they are the same size but trained slightly differently.
Jan-v2-VL is an 8B-parameter vision–language model for long-horizon, multi-step tasks in real software environments (e.g., browsers and desktop apps). It combines language reasoning with visual perception to follow complex instructions, maintain intermediate state, and recover from minor execution errors.
We recognize the importance of long-horizon execution for real-world tasks, where small per-step gains compound into much longer successful chains—so Jan-v2-VL is built for stable, many-step execution. For evaluation, we use The Illusion of Diminishing Returns: Measuring Long-Horizon Execution in LLMs, which measures execution length. This benchmark aligns with public consensus on what makes a strong coding model—steady, low-drift step execution—suggesting that robust long-horizon ability closely tracks better user experience.
Variants
Tasks where the plan and/or knowledge can be provided up front, and success hinges on stable, many-step execution with minimal drift:

Compared with its base (Qwen-3-VL-8B-Thinking), Jan-v2-VL shows no degradation on standard text-only and vision tasks—and is slightly better on several—while delivering stronger long-horizon execution on the Illusion of Diminishing Returns benchmark.



Jan-v2-VL is optimized for direct integration with the Jan App. Simply select the model from the Jan App interface for immediate access to its full capabilities.
Using vLLM:
vllm serve Menlo/Jan-v2-VL-high \
--host 0.0.0.0 \
--port 1234 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--reasoning-parser qwen3
Using llama.cpp:
llama-server --model Jan-v2-VL-high-Q8_0.gguf \
--vision-model-path mmproj-Jan-v2-VL-high.gguf \
--host 0.0.0.0 \
--port 1234 \
--jinja \
--no-context-shift
For optimal performance in agentic and general tasks, we recommend the following inference parameters:
temperature: 1.0
top_p: 0.95
top_k: 20
repetition_penalty: 1.0
presence_penalty: 1.5
Updated Soon