Fine-grain clock gating is one of the most effective techniques for cutting dynamic power in a production design, and it has been a manual job for senior RTL engineers for thirty years. AUTOGATE, from NVIDIA researchers and collaborators at UT Dallas, the University of Maryland, and others, automates the full loop: a machine learning model ingests raw waveform toggling traces spanning millions of cycles, clusters them into compact structured representations, and hands those representations to an LLM that rewrites the RTL to insert gate enables in the right places. The constraint being removed is not just labor. It is the dependency on a senior engineer who knows which registers toggle rarely enough to gate, and who has time to read a waveform to find out.
The engineering insight that makes this work is the ML-LLM co-design. LLMs cannot process millions of waveform cycles directly. The ML clustering step compresses a trace into a form the LLM can reason about without requiring it to look at raw signal data. That is the part that makes this "industry-grade" rather than toy-grade: the system handles the input size problem that has made LLM-based RTL optimization brittle on real designs. The hierarchical multi-agent architecture handles the scale problem on the output side -- large designs get decomposed into independently optimizable modules, and agents coordinate across the hierarchy rather than trying to fit everything in a single context.
Power optimization has historically been a late-stage RTL fixup: a project hits its power budget, someone spends two weeks reading toggle reports and inserting gate enables by hand, and the team hopes the ECO does not break something else. Agentic clock gating moves that step into the regular RTL loop, runnable on demand. The vendors whose tools currently own the power analysis and ECO flow should be paying close attention: the value in that workflow was never the tool, it was the senior engineer's judgment about where to look.