s1 is a reasoning model finetuned from Qwen2.5-32B-Instruct on just 1,000 examples. It matches o1-preview & exhibits test-time scaling via budget forcing.

tools 32b

227 4 weeks ago

66b9ea09bd5b · 68B
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.