Ollama
Models Docs Pricing
Sign in Download
Models Download Docs Pricing Sign in
⇅
Vison · Ollama
Search for models on Ollama.
  • llama3.2-vision

    Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

    vision 11b 90b

    4.5M  Pulls 9  Tags Updated  1 year ago

  • kimi-k2.5

    Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.

    vision tools thinking cloud

    292.4K  Pulls 1  Tag Updated  3 months ago

  • qwen3-vl

    The most powerful vision-language model in the Qwen model family to date.

    vision tools thinking cloud 2b 4b 8b 30b 32b 235b

    3.9M  Pulls 59  Tags Updated  6 months ago

  • deepseek-ocr

    DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.

    vision 3b

    449.7K  Pulls 3  Tags Updated  6 months ago

  • llava

    🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

    vision 7b 13b 34b

    14.1M  Pulls 98  Tags Updated  2 years ago

  • minicpm-v

    A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

    vision 8b

    5.2M  Pulls 17  Tags Updated  1 year ago

  • qwen2.5vl

    Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.

    vision 3b 7b 32b 72b

    2M  Pulls 17  Tags Updated  1 year ago

  • granite3.2-vision

    A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

    vision tools 2b

    913.6K  Pulls 5  Tags Updated  1 year ago

  • mistral-small3.1

    Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

    vision tools 24b

    743.9K  Pulls 5  Tags Updated  1 year ago

  • moondream

    moondream2 is a small vision language model designed to run efficiently on edge devices.

    vision 1.8b

    1.2M  Pulls 18  Tags Updated  2 years ago

  • nemotron-cascade-2

    An open 30B MoE model from NVIDIA with 3B activated parameters that delivers strong reasoning and agentic capabilities.

    tools thinking 30b

    117.4K  Pulls 3  Tags Updated  2 months ago

  • VisionVTAI/Aria-sama

    tools

    22  Pulls 1  Tag Updated  8 months ago

  • vitali87/shell-commands-qwen2-1.5b

    fine-tuned model on Linux Command Library (https://linuxcommandlibrary.com/basic/oneliners)

    374  Pulls 1  Tag Updated  1 year ago

  • deepseek-v4-pro

    DeepSeek-V4-Pro is a frontier Mixture-of-Experts model with a 1M-token context window and three reasoning modes.

    tools thinking cloud

    70.8K  Pulls 1  Tag Updated  3 weeks ago

  • deepseek-v3.2

    DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance.

    tools thinking cloud

    2.2M  Pulls 1  Tag Updated  5 months ago

  • villassvj/Caco

    A reasoning-focused refinement of gemma4:31B, optimized for epistemic honesty, safety, and human-aligned decision making. Designed to prioritize factual accuracy over creative embellishment and maintain transparency in uncertain contexts.

    cloud

    54  Pulls 1  Tag Updated  3 weeks ago

  • vickiovikthompson/uncensored-qwen

    Uncensored version of qwen 2.5

    tools

    199  Pulls 1  Tag Updated  2 months ago

  • vishalraj/dark-champion-21b

    tools

    57  Pulls 1  Tag Updated  3 months ago

  • vijayavp/medreason-qwen25-shortcot-exp2

    2  Pulls 1  Tag Updated  1 month ago

  • visharxd/coupon-generator

    tools

    2  Pulls 1  Tag Updated  1 year ago

© 2026 Ollama
Blog Contact