gurubot/ longwriter-zero-32b

227 Downloads Updated 7 months ago

capable of generating coherent passages exceeding 10,000 tokens.

ollama run gurubot/longwriter-zero-32b

curl http://localhost:11434/api/chat \
  -d '{
    "model": "gurubot/longwriter-zero-32b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='gurubot/longwriter-zero-32b',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'gurubot/longwriter-zero-32b',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

Name

2 models

Size

Context

Input

longwriter-zero-32b:latest

20GB · 32K context window · Text · 7 months ago

longwriter-zero-32b:latest

20GB

32K

Text

longwriter-zero-32b:Q4_K_M

20GB · 32K context window · Text · 7 months ago

longwriter-zero-32b:Q4_K_M

20GB

32K

Text

Readme

Note from gurubot: I have modified this model via template to remove the thinking section and force a more consistent output since by default it did not return consistent output (see discussion of this problem at https://huggingface.co/THU-KEG/LongWriter-Zero-32B/discussions/2 )

LongWriter-Zero ✍️ — Mastering Ultra-Long Text Generation via Reinforcement Learning

🤗 HF Dataset • 📃 Paper

🚀 LongWriter-Zero

LongWriter-Zero is a purely reinforcement learning (RL)-based large language model capable of generating coherent passages exceeding 10,000 tokens.

Built upon Qwen 2.5-32B-Base, the training process includes:

30 billion-token continual pretraining on long-form books and technical reports to enhance fundamental writing capabilities;
Application of Group Relative Policy Optimization (GRPO) with a composite reward function:
- Length Reward Model (RM) enforces the desired output length,
- Writing RM scores fluency, coherence, and helpfulness,
- Format RM ensures strict adherence to the <think>…</think><answer>…</answer> structure, and also detects repeated content to avoid redundancy;
A dedicated prompting strategy that encourages models to explicitly reflect before answering, thereby improving structural planning and fine-grained length control.

The resulting model, LongWriter-Zero-32B, matches or surpasses the performance of 100B-scale models in ultra-long-form generation.

source: https://huggingface.co/mradermacher/LongWriter-Zero-32B-GGUF

![image.png](/assets/gurubot/longwriter-zero-32b/bdf674ac-1c51-46c0-b3ea-b9fd922f1e5c)

Note from gurubot: I have modified this model via template to remove the thinking section and force a more consistent output since by default it did not return consistent output (see discussion of this problem at https://huggingface.co/THU-KEG/LongWriter-Zero-32B/discussions/2 )

# LongWriter-Zero ✍️ — Mastering Ultra-Long Text Generation via Reinforcement Learning

<p align="center">
  🤗 <a href="https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData" target="_blank">HF Dataset</a> • 📃 <a href="https://arxiv.org/abs/2506.18841" target="_blank">Paper</a>
</p>

![image/png](https://cdn-uploads.huggingface.co/production/uploads/63369da91ba5d5ece24118a4/TLTwlGYvPZ1-99MgiveAA.png)

<a name="longwriter_zero"></a>
## 🚀 LongWriter-Zero

**LongWriter-Zero** is a *purely reinforcement learning (RL)-based* large language model capable of generating coherent passages exceeding **10,000 tokens**.

Built upon **Qwen 2.5-32B-Base**, the training process includes:

- **30 billion-token continual pretraining** on long-form books and technical reports to enhance fundamental writing capabilities;
- Application of **Group Relative Policy Optimization (GRPO)** with a composite reward function:
  - *Length Reward Model (RM)* enforces the desired output length,
  - *Writing RM* scores fluency, coherence, and helpfulness,
  - *Format RM* ensures strict adherence to the `<think>…</think><answer>…</answer>` structure, and also detects repeated content to avoid redundancy;
- A dedicated prompting strategy that encourages models to *explicitly reflect* before answering, thereby improving structural planning and fine-grained length control.

The resulting model, **LongWriter-Zero-32B**, matches or surpasses the performance of 100B-scale models in ultra-long-form generation.

source:
https://huggingface.co/mradermacher/LongWriter-Zero-32B-GGUF

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)