Hyperscalers’ Custom AI Chips Pressure Nvidia on Inference

The competitive pressure around Nvidia is widening fast

Nvidia is getting squeezed from more than one side. Even as it rolled out new hardware at the GTC conference in San Jose last week, the market around it kept shifting. Cloud providers are pushing harder on in-house chips, Broadcom is deepening its role as the go-to builder behind many of those designs, and classic competitors are taking advantage of a market that’s increasingly shaped by inference economics—not just brute-force training capability.

The custom AI chip surge from hyperscalers

Hyperscalers are accelerating in-house silicon to reduce GPU dependence

The most direct challenge comes from the hyperscalers themselves. Google, Amazon, Microsoft, and Meta are all moving faster on internal chip programs aimed at lowering reliance on third-party GPUs.

Google introduced its seventh-generation TPU, Ironwood, rated at 4.6 petaflops of FP8 compute per chip, with the ability to scale into pods of 9,216 chips.
Amazon is ramping Trainium 3, built on TSMC’s 3nm process. Production started ramping in early 2026, and AWS says it delivers a 50% cost reduction versus comparable Nvidia-based instances.
Meta said this month it plans to push forward with four new silicon generations over the next two years.
Microsoft recently announced its Maia 200 inference chip.

The direction is clear: major cloud platforms are building more of the stack themselves, with a specific focus on improving cost structure and reducing exposure to Nvidia’s pricing and supply dynamics.

Broadcom’s expanding role in custom AI chip design

A lot of this custom silicon has the same architect behind the scenes: Broadcom.

Counterpoint Research projects Broadcom will control about 60% of the AI server compute ASIC design partner market by 2027. The same research forecasts that ASIC shipments among the top ten hyperscalers will triple between 2024 and 2027—a meaningful signal that these efforts are not small experiments but scaling programs.

Broadcom’s AI business is also accelerating:

$8.4 billion in AI revenue in its most recent quarter
106% year-over-year growth
CEO Hock Tan has said AI chip revenue could exceed $100 billion by 2027

For Nvidia, this matters because every successful hyperscaler ASIC program is another path to inference capacity that doesn’t require Nvidia GPUs.

Nvidia’s response: new platforms and a major inference move

The Vera Rubin platform and the push for inference efficiency

Nvidia isn’t treating this as a slow-moving threat. At GTC 2026, CEO Jensen Huang introduced the Vera Rubin platform. Nvidia says it can deliver up to 10x more inference throughput per watt than the current Blackwell generation.

That framing—throughput per watt—is telling. It aligns directly with the market’s shift toward inference, where efficiency and unit economics increasingly dominate purchasing decisions.

Nvidia’s Groq 3 inference processor and claimed performance gains

Nvidia also disclosed its Groq 3 inference processor, tied to a $20 billion deal to license technology and bring in talent from inference chip startup Groq.

Nvidia says the combined system delivers:

a 35-fold performance increase at the highest-value inference tier

This is a clear signal that Nvidia is working to defend inference not only with GPUs, but also with inference-specific processor strategy and integration.

The scale of demand Nvidia is still projecting

Huang projected that cumulative orders between 2025 and 2027 could reach $1 trillion.

And on the analyst side, Rosenblatt Securities raised its price target, arguing Nvidia’s full-stack advantage—spanning CUDA software, NVLink networking, and rack-scale systems—keeps it positioned to lead in inference as well as training.

The inference battleground: where the economics bite

Inference now dominates AI compute cycles

The sharpest pressure point is inference. Inference represents about two-thirds of all AI compute cycles, and the cost-per-token math tends to reward specialized hardware over general-purpose GPUs.

That economic reality creates room for hyperscaler chips, ASICs, and alternative platforms to win workloads—especially once deployed at scale inside the largest cloud environments.

Reuters reported that some analysts expect Nvidia to start losing share in 2027, once hyperscaler in-house ASIC programs reach meaningful scale in the inference market.

This doesn’t require Nvidia to “lose” on performance alone. If hyperscalers can meet their inference needs at a better cost profile with their own silicon, they can shift volumes even while Nvidia remains strong on the high end.

Software ecosystems: CUDA’s strength and the push to reduce switching friction

CUDA remains Nvidia’s core moat

Nvidia’s strongest defense is still CUDA, supported by over five million active developers. That developer base matters because it lowers deployment friction, speeds iteration, and keeps workloads tightly aligned with Nvidia’s stack.

Competitive stacks are closing gaps and lowering barriers to exit

At the same time, pressure is building from multiple software directions:

AMD is advancing with its open-source ROCm platform
Hyperscalers are building their own stacks that reduce the pain of moving away from Nvidia hardware:
Google’s XLA
Amazon’s Neuron SDK
Microsoft’s custom toolchain

The effect isn’t a single clean rivalry. As Business Insider put it, the competitive environment is becoming “a rapidly widening and increasingly tangled field, even as Nvidia remains miles ahead.”