Ollama
Discord GitHub Models
Sign in Download
Models Discord GitHub Download Sign in
⇅
Vision models · Ollama Search
Search for Vision models on Ollama.
  • gemma3

    The current, most capable model that runs on a single GPU.

    vision 1b 4b 12b 27b

    4.9M  Pulls 21  Tags Updated  1 month ago

  • llama4

    Meta's latest collection of multimodal models.

    vision tools

    337.4K  Pulls 9  Tags Updated  2 weeks ago

  • qwen2.5vl

    Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.

    vision 3b 7b 32b 72b

    73.7K  Pulls 17  Tags Updated  5 days ago

  • mistral-small3.1

    Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

    vision tools 24b

    99.2K  Pulls 5  Tags Updated  1 month ago

  • llava

    🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

    vision 7b 13b 34b

    5.8M  Pulls 98  Tags Updated  1 year ago

  • llama3.2-vision

    Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

    vision 11b 90b

    2.1M  Pulls 9  Tags Updated  4 days ago

  • minicpm-v

    A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

    vision 8b

    1.5M  Pulls 17  Tags Updated  6 months ago

  • llava-llama3

    A LLaVA model fine-tuned from Llama 3 Instruct with better scores in several benchmarks.

    vision 8b

    844.4K  Pulls 4  Tags Updated  1 year ago

  • moondream

    moondream2 is a small vision language model designed to run efficiently on edge devices.

    vision 1.8b

    192.5K  Pulls 18  Tags Updated  1 year ago

  • bakllava

    BakLLaVA is a multimodal model consisting of the Mistral 7B base model augmented with the LLaVA architecture.

    vision 7b

    117.5K  Pulls 17  Tags Updated  1 year ago

  • llava-phi3

    A new small LLaVA model fine-tuned from Phi 3 Mini.

    vision 3.8b

    90.4K  Pulls 4  Tags Updated  1 year ago

  • granite3.2-vision

    A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

    vision tools 2b

    71.5K  Pulls 5  Tags Updated  2 months ago

© 2025 Ollama
Blog Docs GitHub Discord X (Twitter) Meetups Download
  • Blog
  • Download
  • Docs
  • GitHub
  • Discord
  • X (Twitter)
  • Meetups
© 2025 Ollama Inc.