-
Qwen3-Coder
Qwen3-Coder is available in multiple sizes. Today, we’re excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency ........
tools2,195 Pulls 1 Tag Updated 5 months ago
-
Hunyuan-MT-Chimera-7B
The Hunyuan Translation Model comprises a translation model, Hunyuan-MT-7B, and an ensemble model, Hunyuan-MT-Chimera. The translation model is used to translate source text into the target language, while the ensemble model integrates multiple translatio
1,823 Pulls 1 Tag Updated 4 months ago
-
Mistral-Small-3.1
Mistral Small 3.1 mainly focuses on local deployment, and along with Gemma 3 27B, they're both mid-sized multi-modal AI models with billions of parameters. Because they're lightweight, you can run them on something like a single Nvidia RTX4090
tools815 Pulls 1 Tag Updated 10 months ago
-
qwen3
This repo contains both the Q4_K_XL version of Qwen3-30B-A3B-Instruct-2507 and Qwen3-30B-A3B-Thinking-2507
tools thinking774 Pulls 3 Tags Updated 6 months ago
-
gpt-oss
OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. This update builds on the 20b model, applying additional customizations. The default value of `num_ctx` is now set to 32K.
tools thinking568 Pulls 1 Tag Updated 5 months ago
-
Devstral-Small
Devstral is an agentic LLM for software engineering tasks. Devstral 2 models excel at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench.
vision tools345 Pulls 3 Tags Updated yesterday
-
llama-3-taiwan-8b-instruct-dpo
Llama-3-Taiwan-8B-Instruct-DPO is a large language model finetuned for Traditional Mandarin and English users. It has strong capabilities in language understanding, generation, reasoning, and multi-turn dialogue.
265 Pulls 2 Tags Updated 1 year ago
-
gemma3
The Google Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages....
vision240 Pulls 4 Tags Updated 9 months ago
-
Mistral-Small-3.2
Building upon Mistral Small 3.2 (2506) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-t
vision tools168 Pulls 1 Tag Updated 6 months ago
-
GLM-4.7-Flash
This model was base on unsloth/GLM-4.7-Flash and trained on a small reasoning dataset of Claude Opus 4.5, with reasoning effort set to High.
tools thinking143 Pulls 4 Tags Updated yesterday
-
deepcoder
DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL
110 Pulls 1 Tag Updated 9 months ago
-
gemma-3
The Google Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages....
vision55 Pulls 2 Tags Updated 5 months ago