vision · Ollama Search

qwen3-vl

The most powerful vision-language model in the Qwen model family to date.

6,249 Pulls 1 Tag Updated 3 days ago

llava

🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

vision 7b 13b 34b

10.6M Pulls 98 Tags Updated 1 year ago

minicpm-v

A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

vision 8b

3.7M Pulls 17 Tags Updated 11 months ago

llama3.2-vision

Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

vision 11b 90b

2.7M Pulls 9 Tags Updated 4 months ago

qwen2.5vl

Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.

vision 3b 7b 32b 72b

833K Pulls 17 Tags Updated 4 months ago

granite3.2-vision

A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

vision tools 2b

384.7K Pulls 5 Tags Updated 7 months ago

mistral-small3.1

Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

vision tools 24b

325.1K Pulls 5 Tags Updated 6 months ago

moondream

moondream2 is a small vision language model designed to run efficiently on edge devices.

vision 1.8b

298.1K Pulls 18 Tags Updated 1 year ago

VisionVTAI/Aria-sama

tools

8 Pulls 1 Tag Updated 1 month ago

openchat

A family of open-source models trained on a wide variety of data, surpassing ChatGPT on various benchmarks. Updated to version 3.5-0106.

7b

202.7K Pulls 50 Tags Updated 1 year ago

Drews54/llama3.2-vision-abliterated

From huihui-ai/Llama-3.2-11B-Vision-Instruct-abliterated

vision 11b

69.6K Pulls 2 Tags Updated 8 months ago

openbmb/minicpm-o2.6

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

vision 8b

17.5K Pulls 13 Tags Updated 4 months ago

benzie/llava-phi-3

A lightweight vision model

vision

5,438 Pulls 1 Tag Updated 1 year ago

jyan1/paligemma-mix-224

PaliGemma is a versatile and lightweight vision-language model based on open components such as the SigLIP vision model and the Gemma language model.

vision

5,399 Pulls 1 Tag Updated 1 year ago

huihui_ai/granite3.2-vision-abliterated

A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

vision tools 2b

4,558 Pulls 5 Tags Updated 7 months ago

erwan2/DeepSeek-Janus-Pro-7B-Vision-Encoder

Vision Encoder for Janus Pro 7B. This model is under testing

vision

4,531 Pulls 1 Tag Updated 8 months ago

mskimomadto/chat-gph-vision

GPH Vision LLM: Transforming Industries through Intelligent Solutions

vision

3,716 Pulls 1 Tag Updated 1 year ago

knoopx/llava-phi-2

Lightweight and fast vision model, does a decent job describing photos.

vision

2,495 Pulls 2 Tags Updated 1 year ago

ingu627/Qwen2.5-VL-7B-Instruct-Q5_K_M

Qwen2.5VL-7B-Instruct-Q5_K_M is a vision-language model from Alibaba Cloud with 7 billion parameters, designed for processing text and visual inputs, and optimized with Q5_K_M quantization for efficient deployment in ollama.

2,349 Pulls 1 Tag Updated 6 months ago

knoopx/mobile-vlm

Lightweight and fast vision model, does a decent job captioning photos.

vision

2,294 Pulls 1 Tag Updated 1 year ago