68 9 months ago

Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Models

View all →

Readme

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Length Control for Reasoning Language Models with just a Prompt!

L1-Qwen-1.5B-Max requires output to be no longer than the target length, allowing flexibility while respecting upper bounds. The model is converted from l3lab/L1-Qwen-1.5B-Max.

How to use

  1. Download with ollama: ollama pull devkit/L1-Qwen-1.5B-Max
  2. Use as what you expected by appending Think for maximum [REPLACE_ME] tokens to your prompt.

Examples

User: A is B’s father, C is D’s mother, and D and A are brothers. What is the relationship between B and C? Think for maximum 1024 tokens.

For more info about this model, refer to this blog.

@misc{aggarwal2025l1controllinglongreasoning,
            title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning}, 
            author={Pranjal Aggarwal and Sean Welleck},
            year={2025},
            eprint={2503.04697},
            archivePrefix={arXiv},
            primaryClass={cs.CL},
            url={https://arxiv.org/abs/2503.04697}, 
      }