Google Splits AI Inference Chip Design Across Broadcom, MediaTek, and Marvell

Google is running three parallel custom chip programs for AI inference, with each partner optimizing a distinct point on the cost-performance curve. Broadcom handles the high-performance tier: it is designing the TPU v8 training chip, codenamed Sunfish, for TSMC's 2nm node targeting late 2027. MediaTek is building Zebrafish, the cost-optimized inference variant on the same process node, running 20 to 30 percent cheaper than the Broadcom variant. Marvell is in active negotiations to develop a memory processing unit designed to pair with existing TPU accelerators, plus a new inference-optimized TPU -- with nearly two million memory processing units in scope if the deal closes.

The three-partner structure is not accidental redundancy. It is a deliberate segmentation of the inference market into tiers that have different hardware requirements. High-throughput batch inference tolerates Broadcom-class cost if utilization is high enough. Cost-sensitive serving at the edge of Google's product surface -- search, assistant responses, social features -- needs the MediaTek cost point. And the memory processing unit category targets the bandwidth wall that limits inference throughput when models exceed L2 cache: rather than streaming weights through DDR, a co-processor sitting adjacent to the TPU can preload and stage them. These are not the same chip with different price tags. They are different design targets.

What this means for the industry is that inference silicon is fracturing into a tiered market the way storage did with SSD/HDD/tape. Nvidia sells a general-purpose inference solution, and it is losing share not because the hardware is bad but because purpose-built silicon at each tier is cheaper per token. Broadcom already has Google, Apple, and Marvell in its ASIC ecosystem. MediaTek entering hyperscaler inference silicon is a structural shift -- it has cost engineering at scale that x86 and GPU vendors cannot match. Hardware teams specifying inference infrastructure will need to understand their workload tier before they can pick the right chip, which is a harder conversation than it used to be.