This model extends LLama-3 8B's context length from 8k to over 1m tokens.
628.8K Pulls 35 Tags Updated 1 year ago
LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.
987.6K Pulls 6 Tags Updated 4 weeks ago
Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes.
111.9M Pulls 93 Tags Updated 1 year ago
New state of the art 70B model. Llama 3.3 70B offers similar performance compared to the Llama 3.1 405B model.
3.5M Pulls 14 Tags Updated 1 year ago
Qwen3-Coder-Next is a coding-focused language model from Alibaba's Qwen team, optimized for agentic coding workflows and local development.
901.5K Pulls 4 Tags Updated 1 month ago
LFM2.5 is a new family of hybrid models designed for on-device deployment.
1M Pulls 5 Tags Updated 2 months ago
The most powerful vision-language model in the Qwen model family to date.
2.4M Pulls 59 Tags Updated 4 months ago
MiniMax-M2.5 is a state-of-the-art large language model designed for real-world productivity and coding tasks.
142.9K Pulls 1 Tag Updated 1 month ago
Olmo is a series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.
280.6K Pulls 15 Tags Updated 3 months ago
DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.
356.8K Pulls 3 Tags Updated 4 months ago
176.1K Pulls 10 Tags Updated 3 months ago
MiniMax M2 is a high-efficiency large language model built for coding and agentic workflows.
88K Pulls 1 Tag Updated 4 months ago
A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.
31.5K Pulls 1 Tag Updated 3 months ago
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models.
24.8M Pulls 58 Tags Updated 5 months ago
Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.
1.5M Pulls 17 Tags Updated 10 months ago
Phi 4 reasoning and reasoning plus are 14-billion parameter open-weight reasoning models that rival much larger models on complex reasoning tasks.
1.3M Pulls 9 Tags Updated 10 months ago
Meta's latest collection of multimodal models.
1.4M Pulls 11 Tags Updated 9 months ago
IBM Granite 2B and 8B models are 128K context length language models that have been fine-tuned for improved reasoning and instruction-following capabilities.
950.3K Pulls 3 Tags Updated 11 months ago
A state-of-the-art mixture-of-experts (MoE) language model. Kimi K2-Instruct-0905 demonstrates significant improvements in performance on public benchmarks and real-world coding agent tasks.
50.5K Pulls 1 Tag Updated 5 months ago
Meta's Llama 3.2 goes small with 1B and 3B models.
62.2M Pulls 63 Tags Updated 1 year ago