OpenAI MRC Protocol for AI Training Networks

OpenAI Publishes Multipath Reliable Connection for AI Training Networks

OpenAI has published a new networking protocol built to make large-scale AI training clusters faster, more efficient, and more resilient. The protocol, called Multipath Reliable Connection, or MRC, has been released as an open specification through the Open Compute Project.

MRC was developed in collaboration with Microsoft, AMD, Broadcom, Nvidia, and Intel. Its main focus is straightforward: help massive GPU clusters handle two problems that become harder as AI infrastructure grows — network congestion and hardware failures.

The protocol is already running inside OpenAI and Microsoft’s largest training facilities, including the Oracle site in Abilene, Texas, and Microsoft’s Fairwater supercomputers. It has also been used to train GPT-5.5 and other models.

How MRC Helps Large GPU Clusters Avoid Network Congestion

MRC uses a method called packet spraying, which spreads data across hundreds of network paths at the same time. Instead of pushing traffic through one route and risking congestion, the protocol scatters packets across many available paths.

That matters because large AI training clusters depend on constant, high-volume communication between GPUs. When one network path becomes overloaded, the whole job can slow down. MRC is designed to prevent that kind of bottleneck before it becomes a problem.

According to OpenAI networking lead Mark Handley, this packet-spraying approach creates flatter networks that use less power and compute.

MRC Reroutes Traffic in Microseconds When Paths Fail

MRC is also designed for resilience. When a network path fails, the protocol can detect the failure and reroute traffic in microseconds.

That fast response allows training jobs to keep running without interruption. In large-scale AI training, even small failures can create serious delays, so the ability to route around problems quickly is a core part of the protocol’s value.

MRC and SRv6 Reduce Switch-Level Routing Work

MRC also works with SRv6, or IPv6 Segment Routing. SRv6 sets exact data paths through the network instead of forcing switches to make routing decisions along the way.

That can reduce energy use at the switch level because switches have less routing work to do. The data path is already prescribed, so the network can move traffic more directly and efficiently.

Greg Steinbrecher, OpenAI’s workload lead, described the goal this way: “We want to use as much compute as we can get, but also we want to make sure that we're using it efficiently and effectively, and this is a critical component of that.”

OpenAI Positions MRC as an Industry Standard, Not a Competitive Edge

OpenAI is not framing MRC as a technology it wants to keep as a private advantage. Instead, the company is pushing it as a way to move the broader industry beyond what it sees as a legacy bottleneck.

Steinbrecher said that several companies have created their own in-house networking protocols, and that this kind of fragmentation creates problems for the networking industry.

“Several players in the industry have their own in-house implementations of protocols … that type of market fragmentation is bad for the networking industry,” he said. “You want everyone's energy going in one direction and pushing together, and then everyone moves faster as a result.”

Why an Open MRC Specification Matters for AI Data Centers

The release comes as the AI industry faces growing pressure on compute resources. Large training runs need massive amounts of GPU capacity, and the networks connecting those GPUs have to keep up.

By making MRC an open standard, OpenAI and its collaborators aim to speed up adoption of Ethernet-based AI fabrics. The move also reduces reliance on proprietary networking stacks.

That shift could influence how data centers are built for the next generation of AI training. Instead of each company pushing separate networking approaches, MRC is intended to give the industry a shared direction.

Key Benefits of the MRC Networking Protocol

MRC is designed around a few clear advantages for large-scale AI infrastructure:

Less congestion: Packet spraying distributes traffic across hundreds of paths.
Higher resilience: Failed paths can be detected and bypassed in microseconds.
Improved efficiency: Flatter networks can consume less power and compute.
Lower switch workload: SRv6 prescribes exact paths through the network.
Reduced fragmentation: An open standard gives the industry a shared protocol direction.
Support for Ethernet-based AI fabrics: The specification aims to accelerate adoption beyond proprietary networking stacks.

MRC in OpenAI and Microsoft Training Facilities

MRC is not only a proposed standard. It is already deployed in major training environments used by OpenAI and Microsoft.

Those deployments include the Oracle site in Abilene, Texas, and Microsoft’s Fairwater supercomputers. In those facilities, the protocol has supported training for GPT-5.5 and other models.

That practical use gives MRC a direct connection to real large-scale AI training operations, not just future infrastructure planning.