Ollama
Models Docs Pricing
Sign in Download
Models Download Docs Pricing Sign in
⇅
vision · Ollama
Search for models on Ollama.
  • kimi-k2.5

    Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.

    vision tools thinking cloud

    338.4K  Pulls 1  Tag Updated  5 months ago

  • qwen3-vl

    The most powerful vision-language model in the Qwen model family to date.

    vision tools thinking 2b 4b 8b 30b 32b 235b

    4.3M  Pulls 57  Tags Updated  8 months ago

  • deepseek-ocr

    DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.

    vision 3b

    478.2K  Pulls 3  Tags Updated  7 months ago

  • llava

    🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

    vision 7b 13b 34b

    14.2M  Pulls 98  Tags Updated  2 years ago

  • qwen2.5vl

    Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.

    vision 3b 7b 32b 72b

    2.8M  Pulls 17  Tags Updated  1 year ago

  • llama3.2-vision

    Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

    vision 11b 90b

    4.7M  Pulls 9  Tags Updated  1 year ago

  • minicpm-v

    A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

    vision 8b

    5.3M  Pulls 17  Tags Updated  1 year ago

  • granite3.2-vision

    A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

    vision tools 2b

    932.9K  Pulls 5  Tags Updated  1 year ago

  • mistral-small3.1

    Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

    vision tools 24b

    756K  Pulls 5  Tags Updated  1 year ago

  • moondream

    moondream2 is a small vision language model designed to run efficiently on edge devices.

    vision 1.8b

    1.3M  Pulls 18  Tags Updated  2 years ago

  • medgemma

    MedGemma is a collection of Gemma 3 variants that are trained for performance on medical text and image comprehension.

    vision 4b 27b

    105.8K  Pulls 9  Tags Updated  2 months ago

  • VisionVTAI/Aria-sama

    tools

    25  Pulls 1  Tag Updated  10 months ago

  • openchat

    A family of open-source models trained on a wide variety of data, surpassing ChatGPT on various benchmarks. Updated to version 3.5-0106.

    7b

    1.1M  Pulls 50  Tags Updated  2 years ago

  • tinyrick/gemma-4-31B-it-uncensored-heretic-vision-llmfan46

    llmfan46/gemma-4-31B-it-uncensored-heretic-GGU with Vision

    vision tools thinking

    8,348  Pulls 1  Tag Updated  2 weeks ago

  • tinyrick/Qwen3.6-35B-A3B-uncensored-heretic-vision-llmfan46

    llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-GGUF with Vision

    vision

    1,629  Pulls 1  Tag Updated  2 weeks ago

  • NJIR/njir.gen-2

    Next-Gen Sovereign AI Ecosystem - 7 specialized models for Coding, Vision, Reasoning, Edge & RAG. 131K context, 11 Stop Tokens. Native with OpenClaw, VSCode, Cursor, LangChain & 12+ platforms. Forged by NJIRLAH.

    vision embedding tools thinking

    2.3M  Pulls 7  Tags Updated  1 month ago

  • zhamm/qwen3.6

    Qwen3.6-27B-MTP.GGUF model with multimodal vision projector support quantized at Q8.

    vision

    397  Pulls 2  Tags Updated  1 week ago

  • ahmadwaqar/holo-3.1

    Holo-3.1 vision-language computer-use agents by H Company. Locate UI elements and drive web, desktop & mobile automation from a screenshot — returns clicks in normalized [0,1000] coords. 0.8B & 4B, instruct & thinking variants, Q4_K_M/Q8_0. Apache 2.0.

    vision tools 0.8b 4b

    222  Pulls 7  Tags Updated  1 week ago

  • odytrice/qwen3.6

    Qwen 3.6 Ollama profiles for RTX 5090 across 27B dense and 35B-A3B MoE variants, with vision, thinking mode, and native tool calling.

    vision tools thinking

    392  Pulls 3  Tags Updated  2 weeks ago

  • tinyrick/Gemma-4-Harmonia-31B-uncensored-heretic-vision-llmfan46

    vision tools thinking

    311  Pulls 1  Tag Updated  2 weeks ago

© 2026 Ollama
Blog Contact