Ollama
Models Docs Pricing
Sign in Download
Models Download Docs Pricing Sign in
⇅
vision · Ollama
Search for models on Ollama.
  • kimi-k2.5

    Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.

    vision tools thinking cloud

    280.7K  Pulls 1  Tag Updated  3 months ago

  • deepseek-ocr

    DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.

    vision 3b

    442.9K  Pulls 3  Tags Updated  5 months ago

  • qwen3-vl

    The most powerful vision-language model in the Qwen model family to date.

    vision tools thinking cloud 2b 4b 8b 30b 32b 235b

    3.7M  Pulls 59  Tags Updated  6 months ago

  • llava

    🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

    vision 7b 13b 34b

    14M  Pulls 98  Tags Updated  2 years ago

  • llama3.2-vision

    Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

    vision 11b 90b

    4.5M  Pulls 9  Tags Updated  11 months ago

  • minicpm-v

    A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

    vision 8b

    5.2M  Pulls 17  Tags Updated  1 year ago

  • qwen2.5vl

    Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.

    vision 3b 7b 32b 72b

    1.9M  Pulls 17  Tags Updated  11 months ago

  • granite3.2-vision

    A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

    vision tools 2b

    909K  Pulls 5  Tags Updated  1 year ago

  • mistral-small3.1

    Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

    vision tools 24b

    739.8K  Pulls 5  Tags Updated  1 year ago

  • moondream

    moondream2 is a small vision language model designed to run efficiently on edge devices.

    vision 1.8b

    1.2M  Pulls 18  Tags Updated  2 years ago

  • medgemma

    MedGemma is a collection of Gemma 3 variants that are trained for performance on medical text and image comprehension.

    vision 4b 27b

    26.9K  Pulls 9  Tags Updated  3 weeks ago

  • VisionVTAI/Aria-sama

    tools

    22  Pulls 1  Tag Updated  8 months ago

  • openchat

    A family of open-source models trained on a wide variety of data, surpassing ChatGPT on various benchmarks. Updated to version 3.5-0106.

    7b

    1.1M  Pulls 50  Tags Updated  2 years ago

  • studiobrn/uncensoredmodAI

    Fully uncensored local AI for coding, automation, vision tasks, and direct final answers, built to reduce unnecessary thinking output and deliver complete responses.

    vision tools thinking

    276  Pulls 1  Tag Updated  6 days ago

  • fredrezones55/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

    Qwen3.6-35B-A3B uncensored by HauhauCS. 0/465 Refusals. Patched to have vision support; Fully functional, 100% of what the original authors intended - just without the refusals. These are meant to be the best lossless uncensored models out there.

    vision tools thinking

    18.9K  Pulls 5  Tags Updated  3 weeks ago

  • Agen/gemma-4-26B-A4B-it-uncensored-heretic

    llmfan46/gemma-4-26B-A4B-it-uncensored-heretic - quantized to q4_K_M from HF with vision capability retained

    vision tools thinking

    2,085  Pulls 1  Tag Updated  3 weeks ago

  • Keyvan/german-ocr-3

    Deutsche Vision-OCR auf Basis von Qwen3.5. Kompakt, lokal, Open Source. Aus deutschem Rechnungs-/Brief-/Formular-Bild → strikt validiertes JSON. 100 % JSON-Validität, 0 % Halluzination auf 200+ echten DE-Rechnungen (anonymisiert).

    vision tools thinking

    591  Pulls 1  Tag Updated  3 weeks ago

  • robit/qwen3.5-9b-r7-research-vision

    Fine-tuned Qwen3.5-9B with distilled reasoning and full vision support. 883 tensors (427 text + 441 vision + 15 MTP) — vision tower preserved byte-for-byte from base via llama-export-lora merge.

    vision tools thinking

    397  Pulls 1  Tag Updated  4 weeks ago

  • Keyvan/german-ocr-3.1

    Deutsche Vision-OCR. Engineered + optimiert. Lokal. Open Source. Aus deutschem

    vision tools thinking

    128  Pulls 1  Tag Updated  2 weeks ago

  • aeline/opan

    This is a Highly Specialized Vision Model With More Then 2B Parameters.

    vision tools

    147.1K  Pulls 1  Tag Updated  5 months ago

© 2026 Ollama
Blog Contact