Nvidia has a new problem. It's sitting inside Amazon's cloud.
What the Deal Actually Does
Amazon and Cerebras announced Friday they'll combine their chips inside AWS data centers to speed up AI inference — the part of AI that actually generates responses for users.
The setup uses a "divide and conquer" approach. Amazon's own Trainium3 chips handle the first step: reading and processing the user's prompt. Then Cerebras chips take over for the second step: generating the answer. Cerebras CEO Andrew Feldman told Reuters that splitting the work this way is the key to making inference dramatically faster and cheaper.
AWS plans to roll out the service in the second half of 2026.
What Makes Cerebras Different
Cerebras builds what's known as a wafer-scale chip — essentially a processor the size of an entire silicon wafer, roughly the size of a dinner plate. It skips the high-bandwidth memory modules that Nvidia's flagship chips require, which are expensive and create bottlenecks when generating responses word-by-word.
That design has already attracted serious buyers. Earlier this year, Cerebras signed a $10 billion deal with OpenAI to supply compute for ChatGPT. In February, the company closed a $1 billion funding round at a $23.1 billion valuation, backed by Fidelity, Benchmark, and Tiger Global.
Nvidia Isn't Sitting Still
This isn't a story about Nvidia losing — yet. AWS still runs the vast majority of its AI workloads on Nvidia hardware, and that isn't changing overnight.
But the direction is clear. Every major cloud provider is investing in custom chips to reduce their dependence on Nvidia and cut costs. Nvidia spent $20 billion acquiring chip designer Groq in December and is expected to unveil its own disaggregated inference architecture at GTC on March 16.
The AI hardware race just got a second front.
