Meta shipped four successive AI accelerator generations in roughly two years by building every chip from the same RISC-V processing-element core. That is not iteration. It is a fabrication strategy that removes the full-tape-out replacement cycle as a constraint.
The numbers bear this out. From MTIA 300 to MTIA 500, HBM bandwidth grows 4.5x to 27.6 TB/s per accelerator, and compute FLOPS grow 25x when low-precision data types are counted. MTIA 400, now in final lab testing, scales to 72-node interconnect domains (up from 16 on the 300) and is cost-competitive with commercial accelerators on inference tasks. MTIA 450 doubles HBM bandwidth again versus the 400 and is targeting mass deployment in early 2027. MTIA 500 moves to a 2x2 chiplet configuration with up to 512 GB HBM per accelerator, scheduled for 2027. All four generations share the same rack and network infrastructure, so swapping in a newer chip is a drop-in, not a forklift upgrade.
The mechanism that makes this work is the common processing element. Each PE contains two RISC-V vector cores, a dot-product engine, a special-function unit, a reduction engine, and a DMA engine. Because the PE is fixed, each generation reuses software stacks (PyTorch, vLLM, Triton), tooling, and qualification work. What changes between generations is packaging configuration, HBM count, and precision support, not the architectural foundation. This keeps the variable surface area small and bounded, which is what compresses the design cycle to sub-annual cadence.
Incumbent AI accelerator vendors price on capability-per-generation and count on two-to-three-year replacement cycles for revenue. Meta's approach puts hundreds of thousands of chips into production on a short cadence, each generation informed by live workload data the previous generation just produced. The named loser here is the assumption that a stable, fully specified target architecture is a prerequisite for tapeout. The team that builds the fastest-shrinking PE is writing the roadmap for everyone watching.