Ollama
Models GitHub Discord Docs Pricing
Sign in Download
Models Download GitHub Discord Docs Pricing Sign in
⇅
vision · Ollama
Search for models on Ollama.
  • kimi-k2.5

    Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.

    cloud

    32.5K  Pulls 1  Tag Updated  1 week ago

  • qwen3-vl

    The most powerful vision-language model in the Qwen model family to date.

    vision tools thinking cloud 2b 4b 8b 30b 32b 235b

    1.3M  Pulls 59  Tags Updated  3 months ago

  • deepseek-ocr

    DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.

    vision 3b

    150.1K  Pulls 3  Tags Updated  2 months ago

  • qwen2.5vl

    Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.

    vision 3b 7b 32b 72b

    1.2M  Pulls 17  Tags Updated  8 months ago

  • granite3.2-vision

    A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

    vision tools 2b

    710.9K  Pulls 5  Tags Updated  11 months ago

  • mistral-small3.1

    Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

    vision tools 24b

    571.5K  Pulls 5  Tags Updated  10 months ago

  • llava

    🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

    vision 7b 13b 34b

    12.7M  Pulls 98  Tags Updated  2 years ago

  • minicpm-v

    A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

    vision 8b

    4.5M  Pulls 17  Tags Updated  1 year ago

  • llama3.2-vision

    Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

    vision 11b 90b

    3.7M  Pulls 9  Tags Updated  8 months ago

  • moondream

    moondream2 is a small vision language model designed to run efficiently on edge devices.

    vision 1.8b

    590.3K  Pulls 18  Tags Updated  1 year ago

  • VisionVTAI/Aria-sama

    tools

    14  Pulls 1  Tag Updated  5 months ago

  • openchat

    A family of open-source models trained on a wide variety of data, surpassing ChatGPT on various benchmarks. Updated to version 3.5-0106.

    7b

    411K  Pulls 50  Tags Updated  2 years ago

  • openbmb/minicpm-o4.5

    A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Mulitmodal Live Streaming on Your Phone

    vision 8b

    283  Pulls 12  Tags Updated  yesterday

  • aeline/opan

    This is a Highly Specialized Vision Model With More Then 2B Parameters.

    vision tools

    146.9K  Pulls 1  Tag Updated  2 months ago

  • ahmadwaqar/smolvlm2-256m-video

    Ultra-compact 256M vision-language model for video/image understanding. Supports visual QA, captioning, OCR, video analysis. Only 1.38GB VRAM. Built on SigLIP + SmolLM2. Available in Q8 and FP16. Apache 2.0 license.

    vision

    112  Pulls 2  Tags Updated  1 week ago

  • theoistic/Qwen-3-VL-30B-A3B-Instruct

    High Quality Vision Instruct Model

    vision

    78  Pulls 1  Tag Updated  1 week ago

  • huihui_ai/qwen3-vl-abliterated

    The most powerful vision-language model in the Qwen3 model family to date.

    vision tools 2b 4b 8b 30b 32b

    45.5K  Pulls 54  Tags Updated  2 months ago

  • huihui_ai/qwen2.5-vl-abliterated

    Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.

    vision 3b 7b 32b

    2,413  Pulls 16  Tags Updated  2 months ago

  • richardyoung/olmocr2

    State-of-the-art OCR (Optical Character Recognition) vision language model based on [allenai/olmOCR-2-7B-1025](https://huggingface.co/allenai/olmOCR-2-7B-1025).

    vision

    2,301  Pulls 1  Tag Updated  3 months ago

  • haervwe/GLM-4.6V-Flash-9B

    GLM 4.6V Flash 9B model with vision, tools, and hybrid thinking enabled. using custom template to align it to ollama and the recomended sampling settigns by default. using unsloth quants at q4K_M

    vision tools thinking

    760  Pulls 1  Tag Updated  1 month ago

© 2026 Ollama
Blog Contact