devkit/L1-Qwen-1.5B-Max

devkit/

L1-Qwen-1.5B-Max

68 Downloads Updated 9 months ago

Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Models

Name

3 models

Size

Context

Input

L1-Qwen-1.5B-Max:latest

1.9GB · 128K context window · Text · 9 months ago

L1-Qwen-1.5B-Max:latest

1.9GB

128K

Text

L1-Qwen-1.5B-Max:q8_0

1.9GB · 128K context window · Text · 9 months ago

L1-Qwen-1.5B-Max:q8_0

1.9GB

128K

Text

L1-Qwen-1.5B-Max:f16

3.6GB · 128K context window · Text · 9 months ago

L1-Qwen-1.5B-Max:f16

3.6GB

128K

Text

Readme

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Length Control for Reasoning Language Models with just a Prompt!

L1-Qwen-1.5B-Max requires output to be no longer than the target length, allowing flexibility while respecting upper bounds. The model is converted from l3lab/L1-Qwen-1.5B-Max.

How to use

Download with ollama: ollama pull devkit/L1-Qwen-1.5B-Max
Use as what you expected by appending Think for maximum [REPLACE_ME] tokens to your prompt.

Examples

User: A is B’s father, C is D’s mother, and D and A are brothers. What is the relationship between B and C? Think for maximum 1024 tokens.

For more info about this model, refer to this blog.

@misc{aggarwal2025l1controllinglongreasoning,
            title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning}, 
            author={Pranjal Aggarwal and Sean Welleck},
            year={2025},
            eprint={2503.04697},
            archivePrefix={arXiv},
            primaryClass={cs.CL},
            url={https://arxiv.org/abs/2503.04697}, 
      }

## L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Length Control for Reasoning Language Models with just a Prompt!

**L1-Qwen-1.5B-Max**  requires output to be no longer than the target length, allowing flexibility while respecting upper bounds. The model is converted from [l3lab/L1-Qwen-1.5B-Max](https://huggingface.co/l3lab/L1-Qwen-1.5B-Max).

### How to use

1. Download with ollama: `ollama pull devkit/L1-Qwen-1.5B-Max` 
2. Use as what you expected by appending **Think for maximum [REPLACE_ME] tokens** to your prompt.

### Examples
> User: A is B's father, C is D's mother, and D and A are brothers. What is the relationship between B and C? **Think for maximum 1024 tokens**.

For more info about this model, refer to [this blog](https://cmu-l3.github.io/l1/).

```text
@misc{aggarwal2025l1controllinglongreasoning,
            title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning}, 
            author={Pranjal Aggarwal and Sean Welleck},
            year={2025},
            eprint={2503.04697},
            archivePrefix={arXiv},
            primaryClass={cs.CL},
            url={https://arxiv.org/abs/2503.04697}, 
      }
```

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)