AWS shipped Graviton5 in M9g and M9gd EC2 instances, and the architecture is worth reading carefully. The design is not a bigger Graviton4. It is four separate 48-core chiplets, each carrying the Arm Neoverse V3 (Poseidon) compute subsystem, stitched together with die-to-die interconnects running at 420 GB/sec per link. Total core count: 192. Total memory controllers: 12 DDR5. Total PCIe: 8 controllers with 96 lanes and CXL 3.0 support. Process node: TSMC 3nm, up from 4nm on Graviton4.
The constraint being removed is the reticle limit. A 192-core monolithic die on 3nm at Graviton-density would push past what TSMC can expose in a single EUV shot, collapsing yield and collapsing the economics. By building four 48-core chiplets, AWS keeps each die small, keeps per-chiplet yield high, and uses the D2D fabric to stitch them into a single virtual processor. This is exactly the chiplet argument: pay for die-to-die interconnect overhead, save on yield-adjusted cost-per-core. The Annapurna Labs block diagram shown at re:Invent in December was wrong: the monolithic preview was a stand-in; the shipping part is chiplets all the way down.
CXL 3.0 support across 96 PCIe 6.0 lanes is the second signal. For hardware teams running memory-intensive workloads (DRAM simulation, large post-layout sign-off runs, ML training jobs that saturate local DRAM), CXL 3.0 means disaggregated memory capacity without the latency penalty of a second NUMA socket. AWS can tier memory across the CXL fabric at a cost that scales linearly rather than exponentially (DIMM capacity pricing is exponential; fabric-attached capacity is not).
Intel and AMD's argument for monolithic server silicon has always been determinism: one die, one scheduler, no D2D variance. Graviton5 at scale makes that argument harder to sustain. If a hyperscaler can ship 192 cores per socket in chiplet form, qualify it at datacenter volume, and undercut merchant silicon on cost-per-inference, the determinism premium for monolithic dies shrinks. Teams evaluating their 2026 cloud compute platform for EDA, simulation, or AI workloads should benchmark M9g against Intel Xeon 6+ and AMD EPYC Genoa today; the gap is real and it is widening.