Ollama
Models GitHub Discord Turbo
Sign in Download
Models Download GitHub Discord Sign in
⇅
vision · Ollama Search
Search for models on Ollama.
  • llava

    🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

    vision 7b 13b 34b

    9.1M  Pulls 98  Tags Updated  1 year ago

  • minicpm-v

    A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

    vision 8b

    3M  Pulls 17  Tags Updated  9 months ago

  • llama3.2-vision

    Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

    vision 11b 90b

    2.4M  Pulls 9  Tags Updated  3 months ago

  • qwen2.5vl

    Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.

    vision 3b 7b 32b 72b

    522.6K  Pulls 17  Tags Updated  3 months ago

  • granite3.2-vision

    A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

    vision tools 2b

    267.5K  Pulls 5  Tags Updated  6 months ago

  • mistral-small3.1

    Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

    vision tools 24b

    248.7K  Pulls 5  Tags Updated  4 months ago

  • moondream

    moondream2 is a small vision language model designed to run efficiently on edge devices.

    vision 1.8b

    246.1K  Pulls 18  Tags Updated  1 year ago

  • VisionVTAI/Aria-sama

    tools

    2  Pulls 1  Tag Updated  yesterday

  • openchat

    A family of open-source models trained on a wide variety of data, surpassing ChatGPT on various benchmarks. Updated to version 3.5-0106.

    7b

    194.5K  Pulls 50  Tags Updated  1 year ago

  • Drews54/llama3.2-vision-abliterated

    From huihui-ai/Llama-3.2-11B-Vision-Instruct-abliterated

    vision 11b

    68.3K  Pulls 2  Tags Updated  7 months ago

  • openbmb/minicpm-o2.6

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

    vision 8b

    15.4K  Pulls 13  Tags Updated  2 months ago

  • benzie/llava-phi-3

    A lightweight vision model

    vision

    5,345  Pulls 1  Tag Updated  1 year ago

  • jyan1/paligemma-mix-224

    PaliGemma is a versatile and lightweight vision-language model based on open components such as the SigLIP vision model and the Gemma language model.

    vision

    5,266  Pulls 1  Tag Updated  1 year ago

  • erwan2/DeepSeek-Janus-Pro-7B-Vision-Encoder

    Vision Encoder for Janus Pro 7B. This model is under testing

    vision

    4,411  Pulls 1  Tag Updated  6 months ago

  • huihui_ai/granite3.2-vision-abliterated

    A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

    vision tools 2b

    4,099  Pulls 5  Tags Updated  6 months ago

  • mskimomadto/chat-gph-vision

    GPH Vision LLM: Transforming Industries through Intelligent Solutions

    vision

    3,537  Pulls 1  Tag Updated  1 year ago

  • knoopx/llava-phi-2

    Lightweight and fast vision model, does a decent job describing photos.

    vision

    2,468  Pulls 2  Tags Updated  1 year ago

  • knoopx/mobile-vlm

    Lightweight and fast vision model, does a decent job captioning photos.

    vision

    2,270  Pulls 1  Tag Updated  1 year ago

  • ingu627/Qwen2.5-VL-7B-Instruct-Q5_K_M

    Qwen2.5VL-7B-Instruct-Q5_K_M is a vision-language model from Alibaba Cloud with 7 billion parameters, designed for processing text and visual inputs, and optimized with Q5_K_M quantization for efficient deployment in ollama.

    2,054  Pulls 1  Tag Updated  5 months ago

  • cnjack/mistral-samll-3.1

    Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

    1,288  Pulls 1  Tag Updated  5 months ago

© 2025 Ollama
Download Blog Docs GitHub Discord X (Twitter) Contact Us
  • Blog
  • Download
  • Docs
  • GitHub
  • Discord
  • X (Twitter)
  • Meetups
© 2025 Ollama Inc.