
-
Mistral-Small-3.1
Mistral Small 3.1 mainly focuses on local deployment, and along with Gemma 3 27B, they're both mid-sized multi-modal AI models with billions of parameters. Because they're lightweight, you can run them on something like a single Nvidia RTX4090
tools810 Pulls 1 Tag Updated 5 months ago
-
Qwen3-Coder
Qwen3-Coder is available in multiple sizes. Today, we’re excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency ........
tools679 Pulls 1 Tag Updated 1 month ago
-
qwen3
This repo contains both the Q4_K_XL version of Qwen3-30B-A3B-Instruct-2507 and Qwen3-30B-A3B-Thinking-2507
tools thinking394 Pulls 3 Tags Updated 1 month ago
-
llama-3-taiwan-8b-instruct-dpo
Llama-3-Taiwan-8B-Instruct-DPO is a large language model finetuned for Traditional Mandarin and English users. It has strong capabilities in language understanding, generation, reasoning, and multi-turn dialogue.
239 Pulls 2 Tags Updated 1 year ago
-
gemma3
The Google Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages....
vision161 Pulls 4 Tags Updated 4 months ago
-
Devstral-Small
It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3.1 the vision encoder was removed.
vision tools123 Pulls 1 Tag Updated 1 month ago
-
gpt-oss
OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. This update builds on the 20b model, applying additional customizations. The default value of `num_ctx` is now set to 32K.
tools thinking97 Pulls 1 Tag Updated 4 weeks ago
-
Mistral-Small-3.2
Building upon Mistral Small 3.2 (2506) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-t
vision tools90 Pulls 1 Tag Updated 1 month ago
-
deepcoder
DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL
81 Pulls 1 Tag Updated 4 months ago
-
Hunyuan-MT-Chimera-7B
The Hunyuan Translation Model comprises a translation model, Hunyuan-MT-7B, and an ensemble model, Hunyuan-MT-Chimera. The translation model is used to translate source text into the target language, while the ensemble model integrates multiple translatio
80 Pulls 1 Tag Updated 3 days ago
-
gemma-3
The Google Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages....
vision25 Pulls 2 Tags Updated 1 month ago