DirectX Ray Tracing Shift: Clustered Geometry & PTLAS

Why Real-Time Ray Tracing Hasn’t Landed Like People Expected

Real-time ray tracing showed up with big promises back in 2018, when NVIDIA’s GeForce 20-series GPUs made it feel like “movie lighting” might finally become normal in live games. But the reality has been more uneven.

One major reason is simple: this console generation launched with limited ray-tracing capability, and on PC it’s mostly been higher-end systems that can use it well. So even when ray tracing is available, it hasn’t always been practical to turn on—especially when performance drops are steep.

Now Microsoft is outlining a roadmap that points to a more fundamental shift in how ray tracing works, with the goal of making it more efficient and easier to use at scale.

The End of Triangle-by-Triangle Ray Tracing

Why “Tracing Rays” Isn’t Always the Hard Part

There’s an irony here: the ray calculations themselves aren’t always what slows things down most. Before lighting can happen, the system has to prepare a large amount of scene data so ray tracing can run efficiently.

That preparation relies on a data structure used to determine whether a ray intersects objects in a scene. Up to now, the common approach has been to build that structure from the scene’s geometry—specifically, from triangles.

That works fine at small scale. But modern scenes contain enormous amounts of geometry, and the situation gets worse when the geometry is dynamic.

Modern Game Scenes Are Too Dynamic for Constant Full Rebuilds

Modern graphics pipelines don’t just render static worlds. They constantly reshape what’s in memory and what needs to be represented for lighting:

Geometry can be more dynamic when using mesh shaders, where geometry can be generated or managed more fluidly.
Asset streaming means new geometry is constantly streamed in and must be incorporated.
Geometry that leaves memory shouldn’t stay represented in the structure.

The result is ugly: a large chunk of time spent rendering a ray-traced scene can be wasted rebuilding acceleration structures, even when that work doesn’t meaningfully improve what you see on screen.

Microsoft’s response to this is a new approach called Clustered Geometry.

Clustered Geometry: Ray Tracing Built From Reusable Chunks

What Clustered Geometry Changes

Instead of treating each triangle as its own unit, Clustered Geometry groups nearby triangles into compact clusters. Those clusters become the building blocks used to assemble acceleration structures.

It’s a change in how scene data is organized for ray tracing: less “triangle-by-triangle,” more “pre-grouped components” that can be handled more efficiently.

Practical Takeaways for Performance and Scalability

Clustered Geometry leads to a few concrete outcomes:

Acceleration structures can be built faster because they’re assembled from pre-grouped chunks
Memory usage improves because clusters can be reused across multiple structures
Work can be spread across frames instead of being forced into one heavy rebuild

The broader idea isn’t brand-new in computer graphics (it’s been used in concepts like Nanite Virtualized Geometry), but this approach brings the concept into a more mainstream ray-tracing workflow.

PTLAS: Partitioned TLAS Adds a New Level of Manageability

BLAS vs TLAS (And Where the Bottleneck Shows Up)

Ray tracing relies on a hierarchy of structures to stay fast:

BLAS: individual objects (Bottom Level Acceleration Structures)
TLAS: the whole scene (Top Level Acceleration Structures)

In large, dynamic scenes, TLAS can become a bottleneck—because treating “the entire scene” as one top-level structure doesn’t play nicely with constant streaming, animation, and scene changes.

How PTLAS Makes Scene Updates Modular

Partitioned TLAS (PTLAS) shifts the approach: instead of one monolithic TLAS, the top-level structure is split into smaller pieces.

This helps in scenarios that modern games rely on heavily, including:

Geometry streaming from disk
Scenes with lots of animation
Dynamic level-of-detail systems

The key advantage is that engines only need to update the parts of the scene that actually changed. If a full rebuild isn’t necessary, the system can skip that work. Updates and rebuilds become modular rather than all-or-nothing.

GPU-Driven Acceleration Structures: Cutting CPU Coordination

Why CPUs Have Been Taking a Hit With Ray Tracing

A lot of people don’t realize the CPU has been doing meaningful coordination work in ray tracing. Up to now, the CPU has been responsible for orchestrating much of the process.

That matters because turning on ray tracing can bring frame rate drops that are also CPU-limited, not just GPU-limited.

Moving Toward More Autonomous GPU Handling

DXR has allowed acceleration structures to be built on the GPU already, but this direction goes further—pushing toward a model where the GPU handles more of the process on its own.

With less CPU work involved, overall performance—especially minimum performance—should improve.

Clustered Geometry + PTLAS: Scalable Ray Tracing That Doesn’t Fall Off a Cliff

A More Graceful Performance Curve

When Clustered Geometry and PTLAS are combined, ray tracing becomes more scalable. In practical terms, that means:

More systems can handle ray tracing
Performance requirements rise more gracefully as resolution increases or scenes get more detailed

Instead of ray tracing feeling like a sudden performance cliff, the goal is something closer to a steady climb.

What to Expect Next

Even with these shifts in direction, it’ll take time before these ideas show up in shipping games. Still, the trajectory is clear: ray tracing is being reworked to better fit how modern games actually build and stream worlds—without burning huge amounts of time rebuilding structures unnecessarily.

And all of this is happening even as the DirectX version number itself seems like it’s staying put.