Skip to content
hw.dev
hw.dev/analysis/autonomous-design-agents-threshold
Analysis5 min read

Agents Now Build Production-Class Chips. The EDA Bottleneck Has Moved.

Design Conductor built a full AI accelerator from an arXiv paper in 80 hours. The constraint removed is not design speed, it is the human at every abstraction boundary.

#thesis#eda#ai-hardware#tools#verification

Design Conductor 2.0 took an arXiv paper as its only spec and built a complete LLM inference accelerator in 80 hours, fully autonomously: 5,129 FP16/32 processing elements, a 240-cycle pipeline, FPGA-verified at 125 MHz, 5.7 mm2 in TSMC 16FF. The constraint being removed is not design speed. It is the assumption that chip design requires a human engineer at every abstraction boundary, and EDA pricing models that encode that assumption are the first casualty.

From CPU to Accelerator in One Iteration Cycle

Two months before the accelerator result, the same Verkor team published Design Conductor 1.0 (arXiv:2603.08716, February 2026): a system that built a five-stage Linux-capable RISC-V CPU from a 219-word requirements document in 12 hours, with no human intervention in the RTL loop, meeting timing at 1.48 GHz on the ASAP7 PDK. The academic reaction was measured: ASAP7 is not a production process, and the microarchitecture class was well-understood. Fair.

The 2.0 result (arXiv:2605.05170) changes the framing. An accelerator built from a research paper is not a well-understood microarchitecture; it required the agent to interpret an algorithmic contribution (TurboQuant quantization), extract a hardware datapath, and close the design without a human translating research intent into RTL intent. The progression from CPU to custom accelerator in one system generation is the system demonstrating generalization, not incremental benchmark improvement.

The academic baseline confirms the same direction from below. The RTLLM 2.0 benchmark (arXiv:2503.15112, ICCAD '24, HKUST) evaluates LLM performance on 50 hand-crafted RTL design tasks with automated test cases and correct reference implementations. LLMs that could not reliably close a basic counter design in 2023 now pass the majority of the suite. The floor for automated RTL quality is rising faster than the EDA industry's published AI roadmaps anticipated.

Why Now, Not Three Years Ago

Three enabling factors converged that were not simultaneously true in 2023.

Frontier model quality crossed a threshold for multi-step reasoning chains. RTL-to-GDSII is an iterative loop (write RTL, simulate, fix functional failures, synthesize, fix timing violations, place-and-route, run DRC, iterate) where each step requires reasoning about the previous step's output. Earlier models degraded mid-chain. The frontier models powering Design Conductor 2.0, released in April 2026 per the paper's own framing, sustain reasoning quality across an 80-hour run reliably enough to close the loop.

Open-source EDA became a complete, scriptable toolchain. OpenROAD, OpenLane, and the ASAP7 and SKY130 PDKs give an agent a full RTL-to-GDSII flow with no license negotiation and no GUI wall between tool output and agent context. Commercial EDA runs in batch via Tcl scripts; programmatic invocation requires negotiated API access and license tokens that are human-controlled by policy. The open-source stack has no such gate.

LLM inference costs dropped far enough that iterative verification became economically viable without a datacenter contract. Thousands of simulation cycles per design iteration, run in a closed loop, costs compute time. That cost structure is new.

The Constraint Being Removed

Hardware development's idea-to-validation loop has always had a human-mandatory chokepoint: the abstraction crossing. RTL to gate netlist to placed layout to DRC-clean GDSII requires expert judgment at each boundary, not because the steps are mysterious but because the tools were never designed to share a data contract. Each crossing is a file format change, a different tool's bespoke scripting language, and an implicit judgment call ("is this timing closure sufficient to proceed, or do I rerun at higher effort?"). The tooling axis of the design loop is what autonomous agents are compressing.

Agents running OpenLane now traverse all five crossings in a supervised loop, with the model providing the judgment calls. NL2GDS (arXiv:2603.05489, March 2026) demonstrated this on ISCAS benchmark circuits using an entirely open-source pipeline: 36% area reduction, 35% delay reduction, and 70% power savings compared to manual baseline designs. Three separate benchmark categories, each improving. The coordination tax of managing five expert domains in sequence is the thing being removed.

Who Benefits

Academic chip design programs gain immediately. A graduate student who previously needed a licensed EDA stack and a physical design expert to supervise tapeout can now close a research design to GDSII in a weekend. Tiny Tapeout already provides multi-project wafer access at $100-400 per tile; autonomous agents collapse the design expertise barrier on top of the cost barrier. University groups doing custom accelerator research can now iterate on architecture at silicon fidelity, not simulation fidelity.

Fabless startups in the 2-10 person range doing AI inference silicon, a category that raised $8.4B in Q1 2026 per SemiEngineering's funding tracker, gain design-space exploration at RTL-to-GDSII fidelity before committing to a commercial PDK or a design services engagement. Positron AI, which deployed its Asimov inference chip into Oracle Cloud at production scale this month, represents the endpoint of this category: a small team shipping real silicon to a hyperscaler. The autonomous flow can feed that pipeline at the architectural exploration stage before a production commitment.

Who Is Exposed

Design services firms whose billing model is months-long RTL-to-GDSII engagements at senior-engineer day rates are the most directly exposed. When an autonomous agent produces a verified GDSII from a spec in 80 hours on a standard PDK, every design services engagement faces the same question: what is the human judgment adding? Today, the honest answer is commercial PDK-specific expertise, mixed-signal judgment, and customer relationship. The scope of what commands a premium billing rate is shrinking toward those three things, fast.

Synopsys and Cadence are exposed in a specific way. Both raised guidance on AI chip design demand in April 2026 (Cadence lifted its full-year forecast to $6.1-6.2B), and that demand is real. But the pricing model for both companies assumes each tool is operated by a human expert whose productivity the license amplifies. If the operator is an agent running a scriptable open-source stack, the commercial tool becomes the premium tier reserved for production sign-off, not the default for exploration. That migration is already underway, in every academic lab with an OpenLane checkout.

What Builders Should Do

If you are planning a custom silicon tape-out for an AI inference application in the next 18 months, run a two-week parallel track on the open-source autonomous flow before committing to a commercial PDK or an EDA license stack. Use Design Conductor or NL2GDS against your architecture spec. The output cannot tape out on a production node; ASAP7 is an academic process and the results are not foundry-portable. But RTL-level timing closure, area estimates, and datapath validation against your actual design give you informed decisions before you spend on a commercial toolchain: two engineer-weeks of evaluation time, zero license cost, against a commitment that carries a six-figure annual EDA line item.

What Could Stop This

Two specific limits bound the current results. First, all demonstrated autonomous designs run on academic PDKs and open-source EDA flows. The step to a production node (TSMC N2, GF 22FDX, Samsung SF2) requires proprietary IP, sign-off tools (Calibre, StarRC), and EDA licenses that agents cannot invoke autonomously today. If EDA incumbents respond by tightening programmatic access to commercial toolchains, the autonomous loop breaks exactly where design quality matters for tapeout. The economic incentive to do that will become visible inside 12 months as open-source exploration share rises. Second, both Design Conductor and NL2GDS target standard-cell digital logic. Mixed-signal designs, RF circuits, and designs with hard signal integrity or thermal constraints require analog expertise that is not captured in LLM training data at production fidelity. The claim is bounded to digital logic and fails outside it.

If either TSMC or a major EDA vendor publishes a production PDK accessible to autonomous agent flows by end of 2027, the current academic boundary collapses and the claim extends to production silicon. If neither does, this stays an exploration-stage result for at least one more process generation.