SimonPu

Qwen3-Coder

Qwen3-Coder is available in multiple sizes. Today, we’re excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency ........

tools

1,155 Pulls 1 Tag Updated 2 months ago

Hunyuan-MT-Chimera-7B

The Hunyuan Translation Model comprises a translation model, Hunyuan-MT-7B, and an ensemble model, Hunyuan-MT-Chimera. The translation model is used to translate source text into the target language, while the ensemble model integrates multiple translatio

836 Pulls 1 Tag Updated 1 month ago

Mistral-Small-3.1

Mistral Small 3.1 mainly focuses on local deployment, and along with Gemma 3 27B, they're both mid-sized multi-modal AI models with billions of parameters. Because they're lightweight, you can run them on something like a single Nvidia RTX4090

tools

812 Pulls 1 Tag Updated 7 months ago

qwen3

This repo contains both the Q4_K_XL version of Qwen3-30B-A3B-Instruct-2507 and Qwen3-30B-A3B-Thinking-2507

tools thinking

525 Pulls 3 Tags Updated 3 months ago

llama-3-taiwan-8b-instruct-dpo

Llama-3-Taiwan-8B-Instruct-DPO is a large language model finetuned for Traditional Mandarin and English users. It has strong capabilities in language understanding, generation, reasoning, and multi-turn dialogue.

251 Pulls 2 Tags Updated 1 year ago

Devstral-Small

It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3.1 the vision encoder was removed.

vision tools

213 Pulls 1 Tag Updated 3 months ago

gemma3

The Google Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages....

vision

189 Pulls 4 Tags Updated 6 months ago

gpt-oss

OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. This update builds on the 20b model, applying additional customizations. The default value of `num_ctx` is now set to 32K.

tools thinking

183 Pulls 1 Tag Updated 2 months ago

Mistral-Small-3.2

Building upon Mistral Small 3.2 (2506) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-t

vision tools

116 Pulls 1 Tag Updated 3 months ago

deepcoder

DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL

86 Pulls 1 Tag Updated 6 months ago

gemma-3

The Google Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages....

vision

31 Pulls 2 Tags Updated 2 months ago