Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.
338.4K Pulls 1 Tag Updated 5 months ago
The most powerful vision-language model in the Qwen model family to date.
4.3M Pulls 57 Tags Updated 8 months ago
DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.
478.2K Pulls 3 Tags Updated 7 months ago
🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.
14.2M Pulls 98 Tags Updated 2 years ago
Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.
2.8M Pulls 17 Tags Updated 1 year ago
Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.
4.7M Pulls 9 Tags Updated 1 year ago
A series of multimodal LLMs (MLLMs) designed for vision-language understanding.
5.3M Pulls 17 Tags Updated 1 year ago
A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.
932.9K Pulls 5 Tags Updated 1 year ago
Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.
756K Pulls 5 Tags Updated 1 year ago
moondream2 is a small vision language model designed to run efficiently on edge devices.
1.3M Pulls 18 Tags Updated 2 years ago
MedGemma is a collection of Gemma 3 variants that are trained for performance on medical text and image comprehension.
105.8K Pulls 9 Tags Updated 2 months ago
25 Pulls 1 Tag Updated 10 months ago
A family of open-source models trained on a wide variety of data, surpassing ChatGPT on various benchmarks. Updated to version 3.5-0106.
1.1M Pulls 50 Tags Updated 2 years ago
llmfan46/gemma-4-31B-it-uncensored-heretic-GGU with Vision
8,348 Pulls 1 Tag Updated 2 weeks ago
llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-GGUF with Vision
1,629 Pulls 1 Tag Updated 2 weeks ago
Next-Gen Sovereign AI Ecosystem - 7 specialized models for Coding, Vision, Reasoning, Edge & RAG. 131K context, 11 Stop Tokens. Native with OpenClaw, VSCode, Cursor, LangChain & 12+ platforms. Forged by NJIRLAH.
2.3M Pulls 7 Tags Updated 1 month ago
Qwen3.6-27B-MTP.GGUF model with multimodal vision projector support quantized at Q8.
397 Pulls 2 Tags Updated 1 week ago
Holo-3.1 vision-language computer-use agents by H Company. Locate UI elements and drive web, desktop & mobile automation from a screenshot — returns clicks in normalized [0,1000] coords. 0.8B & 4B, instruct & thinking variants, Q4_K_M/Q8_0. Apache 2.0.
222 Pulls 7 Tags Updated 1 week ago
Qwen 3.6 Ollama profiles for RTX 5090 across 27B dense and 35B-A3B MoE variants, with vision, thinking mode, and native tool calling.
392 Pulls 3 Tags Updated 2 weeks ago
311 Pulls 1 Tag Updated 2 weeks ago