IBM Granite 2B and 8B models are 128K context length language models that have been fine-tuned for improved reasoning and instruction-following capabilities.
21.8K Pulls 3 Tags Updated 10 days ago
Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.
56.2K Pulls 5 Tags Updated 2 weeks ago
Cogito v1 Preview is a family of hybrid reasoning models by Deep Cogito that outperform the best available open models of the same size, including counterparts from LLaMA, DeepSeek, and Qwen across most standard benchmarks.
48.7K Pulls 20 Tags Updated 2 weeks ago
111 billion parameter model optimized for demanding enterprises that require fast, secure, and high-quality AI
8,894 Pulls 5 Tags Updated 6 weeks ago
A new state-of-the-art version of the lightweight Command R7B model that excels in advanced Arabic language capabilities for enterprises in the Middle East and Northern Africa.
5,289 Pulls 5 Tags Updated 8 weeks ago
A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.
53.3K Pulls 5 Tags Updated 8 weeks ago
Phi-4-mini brings significant enhancements in multilingual support, reasoning, and mathematics, and now, the long-awaited function calling feature is finally supported.
116.2K Pulls 5 Tags Updated 8 weeks ago
Granite-3.2 is a family of long-context AI models from IBM Granite fine-tuned for thinking capabilities.
89.4K Pulls 9 Tags Updated 2 months ago
The smallest model in Cohere's R series delivers top-tier speed, efficiency, and quality to build powerful AI applications on commodity GPUs and edge devices.
31.2K Pulls 5 Tags Updated 3 months ago
The IBM Granite 2B and 8B models are text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM’s initial testing.
91.7K Pulls 33 Tags Updated 3 months ago
The IBM Granite 1B and 3B models are long-context mixture of experts (MoE) Granite models from IBM designed for low latency usage.
41.1K Pulls 33 Tags Updated 3 months ago
New state of the art 70B model. Llama 3.3 70B offers similar performance compared to the Llama 3.1 405B model.
1.8M Pulls 14 Tags Updated 4 months ago
QwQ is the reasoning model of the Qwen series.
1.4M Pulls 8 Tags Updated 6 weeks ago
Athene-V2 is a 72B parameter model which excels at code completion, mathematics, and log extraction tasks.
79.8K Pulls 17 Tags Updated 5 months ago
SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters.
652.5K Pulls 49 Tags Updated 5 months ago
Cohere For AI's language models trained to perform well across 23 different languages.
59.9K Pulls 33 Tags Updated 6 months ago
The IBM Granite 2B and 8B models are designed to support tool-based use cases and support for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing.
89.9K Pulls 33 Tags Updated 5 months ago
The IBM Granite 1B and 3B models are the first mixture of experts (MoE) Granite models from IBM designed for low latency usage.
47.9K Pulls 33 Tags Updated 5 months ago
Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.
70.7K Pulls 17 Tags Updated 6 months ago
Meta's Llama 3.2 goes small with 1B and 3B models.
14.6M Pulls 63 Tags Updated 7 months ago
The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and code fixing.
5.1M Pulls 196 Tags Updated 5 months ago
A commercial-friendly small language model by NVIDIA optimized for roleplay, RAG QA, and function calling.
78.2K Pulls 17 Tags Updated 7 months ago
Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.
7.2M Pulls 133 Tags Updated 7 months ago
Mistral Small 3 sets a new benchmark in the “small” Large Language Models category below 70B.
614.1K Pulls 21 Tags Updated 2 months ago
Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research
277.2K Pulls 65 Tags Updated 4 months ago
Mistral Large 2 is Mistral's new flagship model that is significantly more capable in code generation, mathematics, and reasoning with 128k context window and support for dozens of languages.
132.6K Pulls 32 Tags Updated 5 months ago
Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes.
91.2M Pulls 93 Tags Updated 4 months ago
A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA.
1.6M Pulls 17 Tags Updated 8 months ago
An open weights function calling model based on Llama 3, competitive with GPT-4o function calling capabilities.
20.6K Pulls 17 Tags Updated 9 months ago
A series of models from Groq that represent a significant advancement in open-source AI capabilities for tool use/function calling.
62.1K Pulls 33 Tags Updated 9 months ago
Qwen2 is a new series of large language models from Alibaba group
4.2M Pulls 97 Tags Updated 7 months ago
Command R+ is a powerful, scalable large language model purpose-built to excel at real-world enterprise use cases.
123.8K Pulls 21 Tags Updated 7 months ago
Command R is a Large Language Model optimized for conversational interaction and long context tasks.
292K Pulls 32 Tags Updated 7 months ago
A set of Mixture of Experts (MoE) model with open weights by Mistral AI in 8x7b and 8x22b parameter sizes.
982K Pulls 70 Tags Updated 4 months ago
The 7B model released by Mistral AI, updated to version 0.3.
12.4M Pulls 84 Tags Updated 9 months ago