This model extends LLama-3 8B's context length from 8k to over 1m tokens.
307.7K Pulls 35 Tags Updated 1 year ago
Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes.
109.6M Pulls 93 Tags Updated 1 year ago
New state of the art 70B model. Llama 3.3 70B offers similar performance compared to the Llama 3.1 405B model.
3.2M Pulls 14 Tags Updated 1 year ago
LFM2.5 is a new family of hybrid models designed for on-device deployment.
27.7K Pulls 5 Tags Updated 1 week ago
The most powerful vision-language model in the Qwen model family to date.
1.3M Pulls 59 Tags Updated 3 months ago
Olmo is a series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.
114.5K Pulls 15 Tags Updated 1 month ago
DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.
143.1K Pulls 3 Tags Updated 2 months ago
68.3K Pulls 10 Tags Updated 1 month ago
MiniMax M2 is a high-efficiency large language model built for coding and agentic workflows.
47.6K Pulls 1 Tag Updated 3 months ago
A state-of-the-art mixture-of-experts (MoE) language model. Kimi K2-Instruct-0905 demonstrates significant improvements in performance on public benchmarks and real-world coding agent tasks.
34.6K Pulls 1 Tag Updated 4 months ago
A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.
14.4K Pulls 1 Tag Updated 2 months ago
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models.
18.4M Pulls 58 Tags Updated 3 months ago
Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.
1.2M Pulls 17 Tags Updated 8 months ago
Phi 4 reasoning and reasoning plus are 14-billion parameter open-weight reasoning models that rival much larger models on complex reasoning tasks.
1.1M Pulls 9 Tags Updated 9 months ago
Meta's latest collection of multimodal models.
1.1M Pulls 11 Tags Updated 7 months ago
IBM Granite 2B and 8B models are 128K context length language models that have been fine-tuned for improved reasoning and instruction-following capabilities.
862.1K Pulls 3 Tags Updated 9 months ago
A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.
702.9K Pulls 5 Tags Updated 11 months ago
Meta's Llama 3.2 goes small with 1B and 3B models.
55.3M Pulls 63 Tags Updated 1 year ago
Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.
20M Pulls 133 Tags Updated 1 year ago
The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and code fixing.
10.7M Pulls 199 Tags Updated 8 months ago