Skip to content
hw.dev
hw.dev/signal/positron-ai-oracle-inference-asimov
SignalEE Times

Positron AI's Asimov Chip Goes Live in Oracle Cloud for MOE Inference

Positron AI has deployed tens of millions of dollars worth of Asimov-based inference systems into Oracle Cloud, targeting mixture-of-experts model inference -- one of the first non-Nvidia, non-AMD chips to reach production scale in a major cloud provider's AI infrastructure.

Thesis connection
decision makingtooling

Purpose-built MOE inference silicon in production at Oracle adds a real non-Nvidia option to the inference-stack decision matrix -- changing the per-token economics that drive model-deployment architecture calls at hyperscaler scale.

#ai-hardware#semiconductor#chiplets
Read Original

Startup Positron AI has deployed Asimov-based inference racks into Oracle Cloud at production scale, CEO Mitesh Agrawal told EE Times. The chips are running mixture-of-experts (MOE) inference workloads and represent one of the first examples of a non-Nvidia, non-AMD AI silicon vendor reaching real cloud deployment -- not pilot, not proof-of-concept.

Why this matters:

Deloitte projects inference will account for roughly two-thirds of AI compute workloads by 2026, worth $50 billion. That's the prize Positron, Cerebras, Groq, and about 10 other startups are racing toward. What's different about the Positron story is the Oracle deployment is described as production -- "multiple tens of millions of dollars' worth of systems and racks." That's a meaningful scale signal, not a design win press release.

The technical angle:

The focus on MOE models is significant. MOE architectures like Mixtral and emerging frontier models use sparse activation patterns that are memory-bandwidth-intensive and architecturally different from the dense transformer workloads that GPUs were optimized for. Positron and Majestic Labs are specifically targeting the memory wall that limits inference on models exceeding 500 billion parameters -- a space where Groq's SRAM-heavy approach excels at speed but falls off at scale.

The broader signal:

Intel and SambaNova announced a multiyear inference partnership in the same period, targeting enterprise and cloud in H2 2026. The inference silicon landscape is fragmenting in a way training never did -- training is concentrated in a small number of hyperscaler clusters, but inference runs everywhere. That creates room for specialized silicon that GPU clusters can't cost-effectively serve.

What to watch:

Whether Positron can expand beyond Oracle to other cloud providers, and what the economics look like per-token compared to H100/H200 clusters. If the cost story holds at production loads, this is a real wedge in the market.