Ollama
Models Docs Pricing
Sign in Download
Models Download Docs Pricing Sign in
⇅
Tools, Vision models · Ollama
Tools, Vision models on Ollama.
  • minimax-m3

    MiniMax M3: Coding & Agentic Frontier. 1M context window. Native Multimodality.

    vision tools thinking cloud

    39.7K  Pulls 1  Tag Updated  1 week ago

  • gemma4

    Gemma 4 models are designed to deliver frontier-level performance at each size. They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.

    vision tools thinking audio cloud e2b e4b 12b 26b 31b

    13.1M  Pulls 47  Tags Updated  23 hours ago

  • qwen3.5

    Qwen 3.5 is a family of open-source multimodal models that delivers exceptional utility and performance.

    vision tools thinking cloud 0.8b 2b 4b 9b 27b 35b 122b

    13.4M  Pulls 64  Tags Updated  2 weeks ago

  • qwen3.6

    Qwen3.6 delivers substantial upgrades in agentic coding and thinking preservation than previous Qwen models.

    vision tools thinking 27b 35b

    2.2M  Pulls 30  Tags Updated  1 week ago

  • glm-ocr

    GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture.

    vision tools

    2.7M  Pulls 3  Tags Updated  4 months ago

  • nemotron3

    NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows.

    vision tools thinking audio 33b

    602.2K  Pulls 4  Tags Updated  1 month ago

  • gemini-3-flash-preview

    Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.

    vision tools thinking cloud

    2.2M  Pulls 2  Tags Updated  5 months ago

  • kimi-k2.6

    Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration.

    vision tools thinking cloud

    294.1K  Pulls 1  Tag Updated  1 month ago

  • kimi-k2.5

    Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.

    vision tools thinking cloud

    316.4K  Pulls 1  Tag Updated  4 months ago

  • mistral-medium-3.5

    Mistral Medium 3.5 is the first flagship model of Mistral AI that merged instruction-following, reasoning, and coding in a single set of 128B weights.

    vision tools thinking 128b

    33.1K  Pulls 5  Tags Updated  1 month ago

  • qwen3-vl

    The most powerful vision-language model in the Qwen model family to date.

    vision tools thinking cloud 2b 4b 8b 30b 32b 235b

    4.1M  Pulls 59  Tags Updated  7 months ago

  • ministral-3

    The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.

    vision tools cloud 3b 8b 14b

    1.2M  Pulls 16  Tags Updated  5 months ago

  • mistral-small3.2

    An update to Mistral Small that improves on function calling, instruction following, and less repetition errors.

    vision tools 24b

    2.3M  Pulls 5  Tags Updated  11 months ago

  • devstral-small-2

    24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.

    vision tools cloud 24b

    862.3K  Pulls 6  Tags Updated  5 months ago

  • mistral-large-3

    A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.

    vision tools cloud

    63.3K  Pulls 1  Tag Updated  6 months ago

  • llama4

    Meta's latest collection of multimodal models.

    vision tools 16x17b 128x17b

    1.7M  Pulls 11  Tags Updated  11 months ago

  • granite3.2-vision

    A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

    vision tools 2b

    922.9K  Pulls 5  Tags Updated  1 year ago

  • mistral-small3.1

    Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

    vision tools 24b

    749.2K  Pulls 5  Tags Updated  1 year ago

© 2026 Ollama
Blog Contact