MHKetbi/

s1.1-32B:q6_K

80 Downloads Updated 10 months ago

a reasoning model finetuned from Qwen2.5-32B-Instruct on just 1,000 examples. It matches o1-preview & exhibits test-time scaling via budget forcing. available in [F16, q8_0, q6_K, q4_K_S]

tools

Updated 10 months ago

10 months ago

0e29f83b6eda · 27GB ·

archqwen2

·

parameters32.8B

·

quantizationQ6_K

27GB

{{- /* System message and tools handling */}} {{- if .Tools}} <|im_start|>system {{- if and (gt (len

2.3kB

You are s1.1, created by simplescaling. You are a helpful assistant that can think before reaching f

112B

Apache license 2.0

18B

{ "num_ctx": 131072, "repeat_penalty": 1.1, "stop": [ "<|im_end|>" ], "t

113B

Readme

pipeline_tag: text-generation

inference: true

license: apache-2.0

datasets: - simplescaling/s1K-1.1

base_model: - Qwen/Qwen2.5-32B-Instruct

library_name: transformers

Model Summary

s1.1 is our sucessor of s1 with better reasoning performance by leveraging reasoning traces from r1 instead of Gemini.

Logs: https://wandb.ai/hashimoto-group/o1/runs/m1ilia77/overview
Repository: simplescaling/s1
Paper: https://arxiv.org/abs/2501.19393

This model is a successor of s1-32B with slightly better performance. Thanks to Ryan Marten for helping generate r1 traces for s1K.

Use

The model usage is documented here.

Evaluation

Metric	s1-32B	s1.1-32B	o1-preview	o1	DeepSeek-R1	DeepSeek-R1-Distill-Qwen-32B
# examples	1K	1K	?	?	>800K	800K
AIME2024	56.7	>56.7<	40.0	74.4	79.8	72.6
AIME2025 I	26.7	>60.0<	37.5	?	65.0	46.1
MATH500	93.0	>95.4<	81.4	94.8	97.3	94.3
GPQA-Diamond	59.6	>63.6<	75.2	77.3	71.5	62.1

Note that s1-32B and s1.1-32B use budget forcing in this table; specifically ignoring end-of-thinking and appending “Wait” once or twice.

[![Support Me - Donate](https://img.shields.io/badge/Support_Me-Donate-9626ff?style=for-the-badge&logo=https%3A%2F%2Fimgur.com%2FvwC39JY)](https://pay.ziina.com/MubarakHAlketbi)

---
pipeline_tag: text-generation

inference: true

license: apache-2.0

datasets:
- simplescaling/s1K-1.1

base_model:
- Qwen/Qwen2.5-32B-Instruct

library_name: transformers

---

# Model Summary

> s1.1 is our sucessor of [s1](https://huggingface.co/simplescaling/s1-32B) with better reasoning performance by leveraging reasoning traces from r1 instead of Gemini.

- **Logs:** https://wandb.ai/hashimoto-group/o1/runs/m1ilia77/overview
- **Repository:** [simplescaling/s1](https://github.com/simplescaling/s1)
- **Paper:** https://arxiv.org/abs/2501.19393

This model is a successor of [s1-32B](https://huggingface.co/simplescaling/s1-32B) with slightly better performance. Thanks to [Ryan Marten](https://huggingface.co/ryanmarten) for helping generate r1 traces for s1K.

# Use

The model usage is documented [here](https://github.com/simplescaling/s1?tab=readme-ov-file#inference).

# Evaluation

| Metric | s1-32B | s1.1-32B | o1-preview | o1 | DeepSeek-R1 | DeepSeek-R1-Distill-Qwen-32B |
|---|---|---|---|---|---|---|
| # examples | 1K | 1K | ? | ? | >800K | 800K |
| AIME2024 | 56.7 | >_56.7_< | 40.0 | 74.4 | **79.8** | 72.6 |
| AIME2025 I | 26.7 | >_60.0_< | 37.5 | ? | **65.0** | 46.1 |
| MATH500 | 93.0 | >_95.4_< | 81.4 | 94.8 | **97.3** | 94.3 |
| GPQA-Diamond | 59.6 | >_63.6_< | 75.2 | **77.3** | 71.5 | 62.1 |

Note that s1-32B and s1.1-32B use budget forcing in this table; specifically ignoring end-of-thinking and appending "Wait" once or twice.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)