Kimi K2.7 Code is Moonshot AI's coding-focused agentic model built upon Kimi K2.6, with substantial improvements on real-world long-horizon coding tasks and roughly 30% lower thinking-token usage.
21.4K Pulls 1 Tag Updated 1 week ago
MiniMax M3: Coding & Agentic Frontier. 1M context window. Native Multimodality.
61.5K Pulls 1 Tag Updated 2 weeks ago
A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
4,267 Pulls 13 Tags Updated 2 weeks ago
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
6,470 Pulls 13 Tags Updated 2 weeks ago
Mistral Medium 3.5 is the first flagship model of Mistral AI that merged instruction-following, reasoning, and coding in a single set of 128B weights.
86.3K Pulls 5 Tags Updated 1 month ago
NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows.
608.9K Pulls 4 Tags Updated 1 month ago
Qwen3.6 delivers substantial upgrades in agentic coding and thinking preservation than previous Qwen models.
2.8M Pulls 30 Tags Updated 2 weeks ago
MedGemma 1.5 4B is an updated version of the MedGemma 4B model.
39.9K Pulls 5 Tags Updated 2 months ago
MedGemma is a collection of Gemma 3 variants that are trained for performance on medical text and image comprehension.
80.9K Pulls 9 Tags Updated 2 months ago
Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration.
310.2K Pulls 1 Tag Updated 2 months ago
Gemma 4 models are designed to deliver frontier-level performance at each size. They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.
15M Pulls 48 Tags Updated 1 week ago
Qwen 3.5 is a family of open-source multimodal models that delivers exceptional utility and performance.
14M Pulls 64 Tags Updated 1 month ago
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture.
4.8M Pulls 3 Tags Updated 4 months ago
Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.
328.4K Pulls 1 Tag Updated 4 months ago
A new collection of open translation models built on Gemma 3, helping people communicate across 55 languages.
1.7M Pulls 13 Tags Updated 5 months ago
Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.
2.2M Pulls 2 Tags Updated 6 months ago
24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.
879K Pulls 6 Tags Updated 6 months ago
The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.
1.2M Pulls 16 Tags Updated 6 months ago
A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.
66K Pulls 1 Tag Updated 6 months ago
DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.
474.2K Pulls 3 Tags Updated 7 months ago