616 Downloads Updated 4 hours ago
ollama run glm-5.2:cloud
ollama launch claude --model glm-5.2:cloud
ollama launch codex-app --model glm-5.2:cloud
ollama launch openclaw --model glm-5.2:cloud
ollama launch hermes --model glm-5.2:cloud
ollama launch codex --model glm-5.2:cloud
ollama launch opencode --model glm-5.2:cloud
GLM-5.2 is Z.ai’s flagship model for the era of long-horizon tasks.
With a truly usable 1M-token context window, it can handle project-level engineering context, execute long-running tasks more reliably, follow engineering standards more consistently, and complete the full development workflow from requirements to multi-platform deployment in a single task.
Supporting long-horizon tasks starts with making long context engineering-usable: the model must maintain quality across long, messy coding-agent trajectories, not just accept more tokens. A 1M context is easy to claim but much harder to keep reliable under real engineering pressure. To this end, Z.AI substantially expanded 1M-context training for coding-agent scenarios — large-scale implementation, automated research, performance optimization, and complex debugging — producing a long-context system that is both wide in scope and solid in execution.
This shows up on three long-horizon coding benchmarks. On FrontierSWE, which measures whether an agent can complete open-ended technical projects at the scale of hours to tens of hours, GLM-5.2 trails Opus 4.8 by only 1% while edging out GPT-5.5 by 1% and Opus 4.7 by 11%. On PostTrainBench, where each agent is given an H100 GPU and judged by how much it improves small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8. On SWE-Marathon, an ultra-long-horizon benchmark covering tasks like building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 trails Opus 4.8 by 13% while remaining second only to the Opus series. Across all three, GLM-5.2 is the highest-ranked open-source model.
On standard coding benchmarks, GLM-5.2 is the strongest open-source model, improving on GLM-5.1 by a wide margin: 81.0 vs. 62.0 on Terminal-Bench 2.1 and 62.1 vs. 58.4 on SWE-bench Pro. It also closes much of the gap to the closed-source frontier — on Terminal-Bench 2.1 (81.0) it lands within a few points of Claude Opus 4.8 (85.0) — while staying ahead of Gemini 3.1 Pro.
GLM-5.2 also introduces effort level control, letting users explicitly balance capability against speed and cost. At comparable token budgets, GLM-5.2 delivers substantially stronger agentic coding than GLM-5.1, with capability roughly positioned between Claude Opus 4.7 and Claude Opus 4.8 under similar token consumption. The Max effort level allocates additional computation when higher performance is needed on challenging tasks.