181 3 months ago

capable of generating coherent passages exceeding 10,000 tokens.

Models

View all →

Readme

image.png

Note from gurubot: I have modified this model via template to remove the thinking section and force a more consistent output since by default it did not return consistent output (see discussion of this problem at https://huggingface.co/THU-KEG/LongWriter-Zero-32B/discussions/2 )

LongWriter-Zero ✍️ — Mastering Ultra-Long Text Generation via Reinforcement Learning

🤗 HF Dataset • 📃 Paper

image/png

🚀 LongWriter-Zero

LongWriter-Zero is a purely reinforcement learning (RL)-based large language model capable of generating coherent passages exceeding 10,000 tokens.

Built upon Qwen 2.5-32B-Base, the training process includes:

  • 30 billion-token continual pretraining on long-form books and technical reports to enhance fundamental writing capabilities;
  • Application of Group Relative Policy Optimization (GRPO) with a composite reward function:
    • Length Reward Model (RM) enforces the desired output length,
    • Writing RM scores fluency, coherence, and helpfulness,
    • Format RM ensures strict adherence to the <think>…</think><answer>…</answer> structure, and also detects repeated content to avoid redundancy;
  • A dedicated prompting strategy that encourages models to explicitly reflect before answering, thereby improving structural planning and fine-grained length control.

The resulting model, LongWriter-Zero-32B, matches or surpasses the performance of 100B-scale models in ultra-long-form generation.

source: https://huggingface.co/mradermacher/LongWriter-Zero-32B-GGUF