AWS Graviton5 Breaks the Reticle Limit With Four 3nm Chiplets and Ships in M9g

AWS shipped Graviton5 in M9g and M9gd EC2 instances, and the architecture is worth reading carefully. The design is not a bigger Graviton4. It is four separate 48-core chiplets, each carrying the Arm Neoverse V3 (Poseidon) compute subsystem, stitched together with die-to-die interconnects running at 420 GB/sec per link. Total core count: 192. Total memory controllers: 12 DDR5. Total PCIe: 8 controllers with 96 lanes and CXL 3.0 support. Process node: TSMC 3nm, up from 4nm on Graviton4.

The constraint being removed is the reticle limit. A 192-core monolithic die on 3nm at Graviton-density would push past what TSMC can expose in a single EUV shot, collapsing yield and collapsing the economics. By building four 48-core chiplets, AWS keeps each die small, keeps per-chiplet yield high, and uses the D2D fabric to stitch them into a single virtual processor. This is exactly the chiplet argument: pay for die-to-die interconnect overhead, save on yield-adjusted cost-per-core. The Annapurna Labs block diagram shown at re:Invent in December was wrong: the monolithic preview was a stand-in; the shipping part is chiplets all the way down.

CXL 3.0 support across 96 PCIe 6.0 lanes is the second signal. For hardware teams running memory-intensive workloads (DRAM simulation, large post-layout sign-off runs, ML training jobs that saturate local DRAM), CXL 3.0 means disaggregated memory capacity without the latency penalty of a second NUMA socket. AWS can tier memory across the CXL fabric at a cost that scales linearly rather than exponentially (DIMM capacity pricing is exponential; fabric-attached capacity is not).

Intel and AMD's argument for monolithic server silicon has always been determinism: one die, one scheduler, no D2D variance. Graviton5 at scale makes that argument harder to sustain. If a hyperscaler can ship 192 cores per socket in chiplet form, qualify it at datacenter volume, and undercut merchant silicon on cost-per-inference, the determinism premium for monolithic dies shrinks. Teams evaluating their 2026 cloud compute platform for EDA, simulation, or AI workloads should benchmark M9g against Intel Xeon 6+ and AMD EPYC Genoa today; the gap is real and it is widening.