Olmo is a series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.
208.1K Pulls 15 Tags Updated 2 months ago
127.9K Pulls 10 Tags Updated 2 months ago
123B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.
112.2K Pulls 6 Tags Updated 3 months ago
nomic-embed-text-v2-moe is a multilingual MoE text embedding model that excels at multilingual retrieval.
100.9K Pulls 1 Tag Updated 3 months ago
FunctionGemma is a specialized version of Google's Gemma 3 270M model fine-tuned explicitly for function calling.
88K Pulls 4 Tags Updated 2 months ago
Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.
80.9K Pulls 2 Tags Updated 2 months ago
Advancing the Coding Capability
64.8K Pulls 1 Tag Updated 2 months ago
The Cogito v2.1 LLMs are instruction tuned generative models. All models are released under MIT license for commercial use.
92.3K Pulls 6 Tags Updated 3 months ago
gpt-oss-safeguard-20b and gpt-oss-safeguard-120b are safety reasoning models built-upon gpt-oss
87.2K Pulls 3 Tags Updated 4 months ago
MiniMax M2 is a high-efficiency large language model built for coding and agentic workflows.
79K Pulls 1 Tag Updated 4 months ago
Advanced agentic, reasoning and coding capabilities.
84.7K Pulls 1 Tag Updated 5 months ago
DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance.
46.9K Pulls 1 Tag Updated 2 months ago
Exceptional multilingual capabilities to elevate code engineering
24K Pulls 1 Tag Updated 2 months ago
Kimi K2 Thinking, Moonshot AI's best open-source thinking model.
37K Pulls 1 Tag Updated 4 months ago
A state-of-the-art mixture-of-experts (MoE) language model. Kimi K2-Instruct-0905 demonstrates significant improvements in performance on public benchmarks and real-world coding agent tasks.
44.7K Pulls 1 Tag Updated 5 months ago
A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.
25K Pulls 1 Tag Updated 3 months ago
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models.
23.7M Pulls 58 Tags Updated 5 months ago
OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.
7.7M Pulls 5 Tags Updated 5 months ago
Alibaba's performant long context models for agentic and coding tasks.
3.5M Pulls 10 Tags Updated 5 months ago
An update to Mistral Small that improves on function calling, instruction following, and less repetition errors.
1.4M Pulls 5 Tags Updated 8 months ago