Ollama
Models GitHub Discord Docs Pricing
Sign in Download
Models Download GitHub Discord Docs Pricing Sign in
⇅
Vison · Ollama
Search for models on Ollama.
  • llama3.2-vision

    Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

    vision 11b 90b

    3.7M  Pulls 9  Tags Updated  8 months ago

  • kimi-k2.5

    Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.

    cloud

    36.1K  Pulls 1  Tag Updated  1 week ago

  • qwen3-vl

    The most powerful vision-language model in the Qwen model family to date.

    vision tools thinking cloud 2b 4b 8b 30b 32b 235b

    1.4M  Pulls 59  Tags Updated  3 months ago

  • deepseek-ocr

    DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.

    vision 3b

    153K  Pulls 3  Tags Updated  2 months ago

  • qwen2.5vl

    Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.

    vision 3b 7b 32b 72b

    1.2M  Pulls 17  Tags Updated  8 months ago

  • granite3.2-vision

    A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

    vision tools 2b

    714.6K  Pulls 5  Tags Updated  11 months ago

  • mistral-small3.1

    Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

    vision tools 24b

    574K  Pulls 5  Tags Updated  10 months ago

  • llava

    🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

    vision 7b 13b 34b

    12.7M  Pulls 98  Tags Updated  2 years ago

  • minicpm-v

    A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

    vision 8b

    4.5M  Pulls 17  Tags Updated  1 year ago

  • moondream

    moondream2 is a small vision language model designed to run efficiently on edge devices.

    vision 1.8b

    593.6K  Pulls 18  Tags Updated  1 year ago

  • VisionVTAI/Aria-sama

    tools

    14  Pulls 1  Tag Updated  5 months ago

  • vitali87/shell-commands-qwen2-1.5b

    fine-tuned model on Linux Command Library (https://linuxcommandlibrary.com/basic/oneliners)

    327  Pulls 1  Tag Updated  1 year ago

  • vishalraj/dark-champion-21b

    tools

    3  Pulls 1  Tag Updated  3 days ago

  • deepseek-v3.2

    DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance.

    cloud

    26K  Pulls 1  Tag Updated  1 month ago

  • visharxd/coupon-generator

    tools

    2  Pulls 1  Tag Updated  1 year ago

  • ViperAI/viper-coder.v.01

    ViperCoder is an advanced developer-focused AI built on a modern code model and optimized for real-world software engineering.

    tools

    22  Pulls 1  Tag Updated  yesterday

  • vanilj/reflection-70b-iq2_xxs

    Reflection Llama-3.1 70B is (currently) the world's top open-source LLM, trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course.

    336  Pulls 1  Tag Updated  1 year ago

  • openbmb/minicpm-o4.5

    A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Mulitmodal Live Streaming on Your Phone

    vision 8b

    596  Pulls 12  Tags Updated  2 days ago

  • aeline/opan

    This is a Highly Specialized Vision Model With More Then 2B Parameters.

    vision tools

    146.9K  Pulls 1  Tag Updated  2 months ago

  • theoistic/Qwen-3-VL-30B-A3B-Instruct

    High Quality Vision Instruct Model

    vision

    89  Pulls 1  Tag Updated  1 week ago

© 2026 Ollama
Blog Contact