QwQ is the reasoning model of the Qwen series.
1.1M Pulls 8 Tags Updated 2 weeks ago
New state of the art 70B model. Llama 3.3 70B offers similar performance compared to the Llama 3.1 405B model.
1.6M Pulls 14 Tags Updated 3 months ago
Meta's Llama 3.2 goes small with 1B and 3B models.
11.6M Pulls 63 Tags Updated 6 months ago
Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes.
63M Pulls 93 Tags Updated 3 months ago
The 7B model released by Mistral AI, updated to version 0.3.
10.9M Pulls 84 Tags Updated 8 months ago
Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.
5.8M Pulls 133 Tags Updated 6 months ago
The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and code fixing.
4.8M Pulls 196 Tags Updated 4 months ago
Qwen2 is a new series of large language models from Alibaba group
4.2M Pulls 97 Tags Updated 6 months ago
A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA.
1.4M Pulls 17 Tags Updated 7 months ago
A set of Mixture of Experts (MoE) model with open weights by Mistral AI in 8x7b and 8x22b parameter sizes.
597.2K Pulls 70 Tags Updated 3 months ago
SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters.
448.4K Pulls 49 Tags Updated 4 months ago
Mistral Small 3 sets a new benchmark in the “small” Large Language Models category below 70B.
385.4K Pulls 21 Tags Updated 8 weeks ago
Command R is a Large Language Model optimized for conversational interaction and long context tasks.
286.6K Pulls 32 Tags Updated 7 months ago
Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research
269.4K Pulls 65 Tags Updated 3 months ago
Mistral Large 2 is Mistral's new flagship model that is significantly more capable in code generation, mathematics, and reasoning with 128k context window and support for dozens of languages.
128.1K Pulls 32 Tags Updated 4 months ago
Command R+ is a powerful, scalable large language model purpose-built to excel at real-world enterprise use cases.
120.6K Pulls 21 Tags Updated 6 months ago
The IBM Granite 2B and 8B models are text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM’s initial testing.
86K Pulls 33 Tags Updated 2 months ago
Phi-4-mini brings significant enhancements in multilingual support, reasoning, and mathematics, and now, the long-awaited function calling feature is finally supported.
85.4K Pulls 5 Tags Updated 4 weeks ago
Athene-V2 is a 72B parameter model which excels at code completion, mathematics, and log extraction tasks.
77.7K Pulls 17 Tags Updated 4 months ago
A commercial-friendly small language model by NVIDIA optimized for roleplay, RAG QA, and function calling.
72.7K Pulls 17 Tags Updated 6 months ago
Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.
68.1K Pulls 17 Tags Updated 5 months ago
The IBM Granite 2B and 8B models are designed to support tool-based use cases and support for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing.
64.3K Pulls 33 Tags Updated 4 months ago
A series of models from Groq that represent a significant advancement in open-source AI capabilities for tool use/function calling.
58.7K Pulls 33 Tags Updated 8 months ago
Granite-3.2 is a family of long-context AI models from IBM Granite fine-tuned for thinking capabilities.
52.4K Pulls 9 Tags Updated 4 weeks ago
Cohere For AI's language models trained to perform well across 23 different languages.
48.8K Pulls 33 Tags Updated 5 months ago
The IBM Granite 1B and 3B models are the first mixture of experts (MoE) Granite models from IBM designed for low latency usage.
45.9K Pulls 33 Tags Updated 4 months ago
The IBM Granite 1B and 3B models are long-context mixture of experts (MoE) Granite models from IBM designed for low latency usage.
37.8K Pulls 33 Tags Updated 2 months ago
A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.
31K Pulls 5 Tags Updated 4 weeks ago
The smallest model in Cohere's R series delivers top-tier speed, efficiency, and quality to build powerful AI applications on commodity GPUs and edge devices.
27.7K Pulls 5 Tags Updated 2 months ago
An open weights function calling model based on Llama 3, competitive with GPT-4o function calling capabilities.
19.6K Pulls 17 Tags Updated 8 months ago
111 billion parameter model optimized for demanding enterprises that require fast, secure, and high-quality AI
4,353 Pulls 5 Tags Updated 2 weeks ago
A new state-of-the-art version of the lightweight Command R7B model that excels in advanced Arabic language capabilities for enterprises in the Middle East and Northern Africa.
4,156 Pulls 5 Tags Updated 4 weeks ago