GLM-5.1 Sets the Pace on SWE-Bench Pro

Z.ai, previously known as Zhipu AI, has introduced GLM-5.1, an open-source flagship model designed for agentic engineering. The model can work autonomously on a single coding task for as long as eight hours, managing planning, execution, testing, and repeated optimization in one continuous cycle.

On the SWE-Bench Pro benchmark, GLM-5.1 posted a score of 58.4. That result put it ahead of GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, giving it the highest score among all models evaluated on that test.

GLM-5.1 Builds on the GLM-5 Foundation

Same Core Architecture With Post-Training Refinement

GLM-5.1 is a post-training refinement of GLM-5, which Z.ai introduced in February. GLM-5 was presented as a 744-billion-parameter Mixture-of-Experts model with about 40 billion active parameters per token.

The newer release keeps that same architecture in place while focusing on stronger coding and agentic performance through post-training improvements.

Progressive Alignment Techniques Improve Coding and Agentic Skills

Z.ai says GLM-5.1 sharpens its capabilities through progressive alignment methods. These include multi-task supervised fine-tuning and reinforcement learning stages, both aimed at improving how the model handles coding-heavy and autonomous engineering workflows.

What GLM-5.1 Can Do in Long-Running Coding Workflows

Eight-Hour Autonomous Execution

According to Z.ai's developer documentation, GLM-5.1 is among the few models able to sustain eight hours of autonomous execution. During that span, it can complete what the company describes as a full experiment-analyze-optimize loop.

The model is designed to keep working on a coding task without stopping after just one try. It can plan, take action, check the results, and keep improving its output over several rounds.

Demonstrated Iteration and Optimization

In representative demonstrations, GLM-5.1 built a complete Linux desktop system from scratch within eight hours. During that process, it carried out 655 iterations autonomously.

In another reported result, the model increased vector database query throughput to 6.9 times the initial production version. These examples highlight the kind of repeated optimization work GLM-5.1 is built to handle.

Context Window and Workflow Support

GLM-5.1 includes a 200,000-token context window and supports up to 128,000 output tokens. It has also been optimized for agentic coding workflows that work with tools such as Claude Code and OpenClaw.

On the KernelBench Level 3 optimization benchmark, the model achieved a 3.6x geometric mean speedup on real machine learning workloads.

GLM-5.1 in the Open-Source Coding Model Race

Immediate Availability and Open Weights

GLM-5.1 is available now to all GLM Coding Plan subscribers. Its weights have also been released under an MIT license, reinforcing its position in the open-source coding model landscape.

API Pricing and Market Position

Z.ai is pricing API access at $1.00 per million input tokens and $3.20 per million output tokens.

The launch adds pressure in the open-source coding model market, where GLM-5.1 now leads SWE-Bench Pro ahead of closed-source competitors.

Capability Positioning Against Claude, GPT, and Gemini

Z.ai's own documentation says the model's overall capability is aligned with Claude Opus 4.6. At the same time, independent evaluations indicate that GLM-5.1 reaches about 94.6 percent of Opus 4.6's broader coding score.

Those same evaluations also suggest there are still gaps in general reasoning and creative tasks, even with GLM-5.1's strong coding performance.

Training and Hardware Background

GLM-5, the base model behind GLM-5.1, was trained entirely on Huawei Ascend chips and did not use Nvidia hardware. GLM-5.1 keeps that underlying architecture while focusing its gains on coding and agentic engineering performance through additional alignment work.