32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data with performance on par with o1 preview.

tools

Updated 11 months ago

11 months ago

80155ae4b31c · 20GB ·

model

archqwen2

parameters32.8B

quantizationQ4_K_M

20GB

system

You are Sky-T1, created by Novasky. You are a helpful assistant.

64B

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

template

{{- if .Messages }} {{- if or .System .Tools }}<|im_start|>system {{- if .System }} {{ .System }} {{

1.5kB

Readme

This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding. Please see our blog post for more details.

Developed by: NovaSky Team from Sky Computing Lab at UC Berkeley.

Evaluation

	Sky-T1-32B-Preview	Qwen-2.5-32B-Instruct	QwQ	o1-preview
Math500	82.4	76.2	85.4	81.4
AIME2024	43.3	16.7	50.0	40.0
LiveCodeBench-Easy	86.3	84.6	90.7	92.9
LiveCodeBench-Medium	56.8	40.8	56.3	54.9
LiveCodeBench-Hard	17.9	9.8	17.1	16.3
GPQA-Diamond	56.8	45.5	52.5	75.2

Acknowledgement

We would like to thanks the compute resources from Lambda Lab and AnyScale. We would like to thanks the academic feedback and support from the Still-2 Team, and Junyang Lin from the Qwen Team.

References

Huggingface

Blog Post

This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding.
Please see our [blog post](https://novasky-ai.github.io/posts/sky-t1/) for more details.

- **Developed by:** NovaSky Team from Sky Computing Lab at UC Berkeley.

## Evaluation
|               | Sky-T1-32B-Preview | Qwen-2.5-32B-Instruct | QwQ   | o1-preview |
|-----------------------|---------------------|--------|-------|------------|
| Math500              | 82.4                    | 76.2    | **85.4** | 81.4       |
| AIME2024             | 43.3                    | 16.7    | **50.0**  | 40.0       |
| LiveCodeBench-Easy   | 86.3                    | 84.6   | **90.7**  | 92.9       |
| LiveCodeBench-Medium | **56.8**                    | 40.8   | 56.3  | 54.9       |
| LiveCodeBench-Hard   | **17.9**                    | 9.8   | 17.1  | 16.3       |
| GPQA-Diamond         | 56.8                    | 45.5   | 52.5  | **75.2**       |

## Acknowledgement
We would like to thanks the compute resources from [Lambda Lab](https://lambdalabs.com/service/gpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5) and [AnyScale](https://www.anyscale.com/). We would like to thanks the academic feedback and support from the [Still-2 Team](https://arxiv.org/pdf/2412.09413), and [Junyang Lin](https://justinlin610.github.io/) from the [Qwen Team](https://qwenlm.github.io/).

# References
[Huggingface](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview)

[Blog Post](https://novasky-ai.github.io/posts/sky-t1/)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)