
-
DeepSeek-R1-Distill-Qwen-32B
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models.
4,238 Pulls 2 Tags Updated 2 months ago
-
watt-tool-8B
watt-tool-8B is a fine-tuned language model based on LLaMa-3.1-8B-Instruct, optimized for tool usage and multi-turn dialogue. It achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard (BFCL).
999 Pulls 1 Tag Updated 2 months ago
-
DeepSeek-R1-Distill-Qwen-14B
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models.
535 Pulls 3 Tags Updated 2 months ago
-
watt-tool-70B
watt-tool-70B is a fine-tuned language model based on LLaMa-3.3-70B-Instruct, optimized for tool usage and multi-turn dialogue. It achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard (BFCL).
362 Pulls 1 Tag Updated 2 months ago
-
Sky-T1-32B-Preview
This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding.
236 Pulls 2 Tags Updated 2 months ago