Skip to content
hw.dev
hw.dev/signal/nvidia-spectrum-x-mrc-ocp-open-spec
SignalNVIDIA Blog

NVIDIA Opens MRC Ethernet Protocol Through OCP as Production AI Fabric Spec

The Multipath Reliable Connection protocol that OpenAI and Microsoft ran on Blackwell at scale is now an open OCP specification, turning NVIDIA's proprietary fabric innovation into an industry interop baseline.

#ai-hardware#tools#software
Read Original

NVIDIA has released the Multipath Reliable Connection protocol through the Open Compute Project as an open specification. MRC is the RDMA transport that OpenAI, Microsoft, and Oracle deployed on Spectrum-X Ethernet for frontier training runs. It distributes a single RDMA connection across multiple network paths, routes around congestion in real time, and retransmits lost packets without stalling the job. The move from "NVIDIA's fabric protocol" to "the industry's AI training fabric protocol" is not a generous act. It is a standards play, and the outcome is already visible in who is on the spec.

The problem MRC solves is concrete: RDMA in large training clusters carries a single-path assumption inherited from InfiniBand-era designs. Any path failure or hot link slows the entire training run because collective communication serializes through that bottleneck. In a 10,000-GPU cluster, a single congested switch port can drop effective GPU utilization by 10-15%. MRC spreads traffic across all available paths simultaneously, with the fabric choosing routes dynamically and retransmitting without job-level stalls. OpenAI reported it "avoided much of the typical network-related slowdowns" on frontier Blackwell training runs.

Opening MRC through OCP means Arista, Cisco, and any Broadcom merchant-silicon switch vendor can implement the spec. For hardware teams designing AI compute infrastructure, the sourcing calculus changes: a training cluster can now be built on any OCP-compliant MRC switch, not only on Spectrum-X. The vendors who lose are InfiniBand-first designs that still rely on single-path RDMA and proprietary fabric management stacks. Once MRC is the OCP baseline, "AI-native Ethernet" stops being a marketing differentiator and becomes a checkable interop requirement. NVIDIA stays the reference implementation; the rest of the market catches up on NVIDIA's terms.