68 9 months ago

Controlling How Long A Reasoning Model Thinks With Reinforcement Learning