hengwen

DeepSeek-R1-Distill-Qwen-32B

DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models.

91.2K Pulls 2 Tags Updated 10 months ago

watt-tool-8B

watt-tool-8B is a fine-tuned language model based on LLaMa-3.1-8B-Instruct, optimized for tool usage and multi-turn dialogue. It achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard (BFCL).

1,645 Pulls 1 Tag Updated 10 months ago

watt-tool-70B

watt-tool-70B is a fine-tuned language model based on LLaMa-3.3-70B-Instruct, optimized for tool usage and multi-turn dialogue. It achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard (BFCL).

1,265 Pulls 1 Tag Updated 10 months ago

DeepSeek-R1-Distill-Qwen-14B

DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models.

755 Pulls 3 Tags Updated 10 months ago

Sky-T1-32B-Preview

This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding.

258 Pulls 2 Tags Updated 10 months ago