NoC Coherency Is the Hidden Tax on AI SoC Design

Arteris VP Andy Nightingale puts it plainly: compute is scaling faster than Moore's Law, but data movement and energy efficiency determine whether that compute is actually usable. This piece is a useful corrective to the narrative that AI SoC design is primarily a compute problem. It isn't. It's a data routing problem dressed up as a compute problem.

The CPU vs. NPU coherency split is the key insight here. CPUs need cache-coherent NoCs because the programming model assumes coherence -- you can't break that contract without breaking software. NPUs and accelerators almost always get non-coherent NoCs because throughput and power efficiency matter more than strict consistency semantics. That sounds clean in theory, but real AI SoCs have both, plus DMA engines, PCIe endpoints, and now die-to-die chiplet interfaces. Every domain boundary is a coherency negotiation, and each one is a potential performance cliff if it's misspecced at architecture time.

The multi-die angle is what makes this particularly sharp right now. Chiplets are forcing coherency decisions even earlier in the design cycle because you're not just partitioning logic -- you're negotiating protocols across physical die boundaries with limited bandwidth. The old approach of adding commercial NoC IP late in the flow doesn't work when the package topology constrains what topologies are even viable. Teams that treat this as an implementation detail rather than an architectural decision are going to get burned at integration time.