AMD Instinct MI350P Brings CDNA 4 to Air-Cooled Enterprise Servers

AMD's MI350P puts CDNA 4 compute into a dual-slot PCIe card designed explicitly for air-cooled servers. The constraint it removes is not compute density. It is the liquid-cooling prerequisite that has kept enterprise inference deployments waiting on facilities budgets.

The MI350P packs 144 GB of HBM3E and 4.6 PFLOPs of FP4 throughput into the 600W PCIe CEM envelope, fitting standard 2U and 4U air-cooled rack systems. The architecture is a deliberate half-die design: AMD built a smaller socket with half the MI350X components rather than binning down a full die, because the full chip cannot be passively cooled at standard rack airflow. At 450W throttle the card loses roughly 10-15% throughput but stays within what most enterprise data centers can handle without retrofitting cooling infrastructure. At full 600W it delivers 43% better FP16 and 39% better FP8 theoretical compute than the H200 NVL, in a slot that any certified PCIe server already has.

If the MI350P ships volume in Q2 and qualifies on major inference software stacks (vLLM, TensorRT-LLM, ROCm), Nvidia's H100-class OAM-only form factor looks like a constraint it imposed on itself. Enterprise accounts that could not commit to liquid-cooled infrastructure now have a competitive alternative. The loser is every GPU-as-a-service play built on the assumption that PCIe-slot AI compute was a dead end.