7,452 Downloads Updated yesterday
ollama run glm-5:cloud
GLM-5 is a mixture-of-experts model from Z.ai with 744B total parameters and 40B active parameters. It scales up from GLM-4.5’s 355B parameters and is designed for complex reasoning, coding, and agentic tasks.
The model uses DeepSeek Sparse Attention (DSA) to reduce deployment costs while preserving long-context capacity, and was post-trained using a novel asynchronous RL infrastructure for improved training efficiency.
temperature=1.0, top_p=0.95, max_new_tokens=131072). By default, we report the text-only subset; results marked with * are from the full set. We use GPT-5.2 (medium) as the judge model. For HLE-with-tools, we use a maximum context length of 202,752 tokens.temperature=0.7, top_p=0.95, max_new_tokens=16384, with a 200K context window.timeout=2h, temperature=0.7, top_p=1.0, max_new_tokens=8192, with a 128K context window. Resource limits are capped at 16 CPUs and 32 GB RAM.temperature=1.0, top_p=0.95, max_new_tokens=65536. We remove wall-clock time limits due to generation speed, while preserving per-task CPU and memory constraints. Scores are averaged over 5 runs. We fix environment issues introduced by Claude Code and also report results on a verified Terminal-Bench 2.0 dataset that resolves ambiguous instructions (see: https://huggingface.co/datasets/zai-org/terminal-bench-2-verified).temperature=1.0, top_p=1.0, max_new_tokens=32000) and a 250-minute timeout per task. Results are single-run Pass@1 over 1,507 tasks.