The most powerful vision-language model in the Qwen model family to date.
2.6M Pulls 59 Tags Updated 5 months ago
Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.
195.9K Pulls 1 Tag Updated 2 months ago
DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.
366.7K Pulls 3 Tags Updated 4 months ago
Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.
1.6M Pulls 17 Tags Updated 10 months ago
Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.
675.1K Pulls 5 Tags Updated 11 months ago
🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.
13.5M Pulls 98 Tags Updated 2 years ago
Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.
4.2M Pulls 9 Tags Updated 10 months ago
A series of multimodal LLMs (MLLMs) designed for vision-language understanding.
4.9M Pulls 17 Tags Updated 1 year ago
A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.
844.7K Pulls 5 Tags Updated 1 year ago
moondream2 is a small vision language model designed to run efficiently on edge devices.
888.6K Pulls 18 Tags Updated 1 year ago
20 Pulls 1 Tag Updated 6 months ago
A family of open-source models trained on a wide variety of data, surpassing ChatGPT on various benchmarks. Updated to version 3.5-0106.
797.7K Pulls 50 Tags Updated 2 years ago
9,806 Pulls 1 Tag Updated 3 weeks ago
Q8_0 Non-thinking Uncensored Non-Vision
2,630 Pulls 4 Tags Updated 2 weeks ago
Qwen3.5-Claude-4.6-Opus-Reasoning-Distilled-v2; https://huggingface.co/Jackrong/; has vision properly merged and efficiently quantified.
1,904 Pulls 4 Tags Updated 3 days ago
Coding-optimized variants of the official Qwen3.5 MoE models — full vision capability retained, tuned for precise code generation via lower temperature. Based on Alibaba's Qwen3.5 distributed through the Ollama registry.
1,532 Pulls 2 Tags Updated 1 week ago
A text-only, thinking-capable variant of Qwen3.5-35B-A3B — leaner and faster by removing the CLIP vision projector. Based on Unsloth's Q4_K_M quantization of Alibaba's Qwen3.5-35B-A3B.
737 Pulls 2 Tags Updated 1 week ago
981 Pulls 1 Tag Updated 3 weeks ago
Q8_0 Uncensored Non-Vision
730 Pulls 4 Tags Updated 2 weeks ago
A fork of coder3101/Cydonia-24B-v4.3-heretic-v3 (mradermacher's quant at Q4_K_M), with vision mmproj from bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF.
243 Pulls 1 Tag Updated 2 weeks ago