AWS to Deploy 1 Million NVIDIA GPUs for AI Cloud Growth

AWS Expands AI Cloud Infrastructure With More Than 1 Million NVIDIA GPUs

Amazon Web Services is making a massive bet on AI infrastructure, committing to deploy more than 1 million NVIDIA GPUs across its global cloud regions beginning this year. The expansion marks a major step in the long-running partnership between AWS and NVIDIA and signals how aggressively hyperscale cloud providers are scaling compute capacity for AI training, inference, enterprise workloads, and data processing.

The deployment will span the full NVIDIA AI computing stack. That includes Blackwell and Vera Rubin GPU architectures, RTX PRO Blackwell Server Edition GPUs for enterprise use cases, and NVIDIA Groq 3 LPUs built for ultralow-latency inference. AWS and NVIDIA are also working together on Spectrum networking infrastructure, which adds another critical layer to large-scale AI system performance. And that matters, because raw GPU volume alone isn't enough. Networking, orchestration, and workload optimization are what turn hardware into usable AI capacity.

NVIDIA Blackwell and Vera Rubin Architectures Power AWS AI Growth

The infrastructure expansion covers multiple generations and categories of NVIDIA hardware, giving AWS a broader foundation for different AI workloads. Blackwell and Vera Rubin architectures sit at the center of this move, positioning AWS to support both current and next-wave AI compute demands across training and inference.

Blackwell GPUs for Scalable AI Compute

NVIDIA Blackwell GPUs are part of the core stack AWS plans to deploy globally. These chips are designed for modern AI workloads that need high performance across training, inference, and enterprise applications. By adding Blackwell-based infrastructure at this scale, AWS is strengthening its ability to serve customers building large language models, vision systems, and data-intensive AI services.

Vera Rubin Expands Future-Ready AI Infrastructure

AWS also plans to include NVIDIA Vera Rubin GPU architecture in the rollout. That detail matters because it shows this isn't just a short-term capacity increase. It's a broader infrastructure roadmap. Including both Blackwell and Vera Rubin suggests AWS is preparing for sustained AI demand and wants its global cloud footprint aligned with NVIDIA's evolving platform strategy.

RTX PRO Blackwell Server Edition GPUs for Enterprise AI Workloads

Enterprise AI doesn't always need the biggest, most power-hungry accelerator in the rack. That's where RTX PRO Blackwell Server Edition GPUs come in. AWS is integrating these GPUs for enterprise workloads, helping support use cases like data processing, vision AI, and smaller model inference where efficiency and deployment flexibility matter just as much as peak performance.

AWS Becomes First Major Cloud Provider to Offer NVIDIA RTX PRO 4500 Blackwell Server Edition

AWS will be the first major cloud provider to offer instances powered by the NVIDIA RTX PRO 4500 Blackwell Server Edition GPU. This is a compact 165-watt chip aimed at practical enterprise and data-centric workloads, not just giant foundation models.

That makes the move especially interesting. A lot of AI headlines focus on massive training clusters, but real adoption often happens in quieter places: processing business data, running computer vision pipelines, and serving smaller language models efficiently. This new GPU fits that lane.

Compact GPU Design for Data Processing and Vision AI

The NVIDIA RTX PRO 4500 Blackwell Server Edition GPU is built for data processing, vision AI, and small language model inference. Its 165-watt design makes it a more compact option for organizations that need performance without the footprint associated with larger accelerators. For enterprise teams working on applied AI rather than frontier model training, this kind of hardware can be a better match.

Amazon EC2 Instances Built on AWS Nitro System

The new Amazon EC2 instances using the RTX PRO 4500 Blackwell Server Edition will be built on the AWS Nitro System. That gives the offering a tighter fit inside AWS's infrastructure model, where isolation, performance, and cloud-native deployment all matter. The pairing suggests AWS is aiming to make these instances not just available, but operationally practical for companies that want to move AI workloads into production.

Amazon EMR Support for Data Processing Workloads

These new EC2 instances are also suited for use with Amazon EMR for data processing workloads. That's a key detail because it ties the GPU launch directly to real enterprise analytics and big data pipelines. Instead of positioning the hardware as a generic AI compute option, AWS is anchoring it to established data platforms where businesses already do large-scale processing.

Amazon EMR and EC2 G7e Deliver 3x Faster Apache Spark Performance

AWS and NVIDIA highlighted joint engineering work that delivers 3x faster Apache Spark performance using Amazon EMR on Amazon Elastic Kubernetes Service with EC2 G7e instances. Those instances are powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs.

This is the kind of improvement that changes operational decisions. Spark sits at the heart of many large data processing environments, so a 3x performance gain isn't a nice-to-have stat. It can reshape processing windows, reduce infrastructure pressure, and improve how quickly teams can move from raw data to usable outputs.

Joint Engineering Between AWS and NVIDIA

The performance gain comes from direct engineering collaboration between the two companies. That's worth paying attention to because it shows the partnership goes beyond supply agreements and hardware announcements. AWS and NVIDIA are tuning infrastructure and software together, which is usually where the real performance wins show up.

Apache Spark Performance Improvements on Amazon EMR on EKS

Running Amazon EMR on Amazon Elastic Kubernetes Service with EC2 G7e instances produced the reported 3x faster Apache Spark performance. This points to a combined optimization story across compute, orchestration, and analytics tooling. For customers already invested in Kubernetes-based data environments, that kind of acceleration could make GPU-backed Spark workflows a lot more appealing.

EC2 G7e Instances With NVIDIA RTX PRO 6000 Blackwell GPUs

The EC2 G7e instances involved in this performance gain are powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. These GPUs bring more capability to advanced enterprise and data-heavy workloads, helping AWS support customers that need stronger performance for analytics, machine learning preprocessing, and GPU-accelerated data frameworks.

AWS Uses NVIDIA NIXL and Elastic Fabric Adapter for Lower-Latency LLM Inference

AWS also published details on using NVIDIA's NIXL communication library with its Elastic Fabric Adapter to enable disaggregated large language model inference. The approach splits prefill and decode phases across distributed GPU resources to reduce latency.

Here's what that really means: instead of forcing every stage of model inference onto the same tightly bundled resource block, AWS is separating the work and distributing it more efficiently. And for LLM services, latency is everything. If responses drag, the user experience falls apart fast.

Disaggregated Large Language Model Inference Architecture

Disaggregated inference allows AWS to allocate GPU resources more selectively across different inference stages. By splitting prefill and decode phases, the system can use distributed infrastructure in a way that improves responsiveness. That's a meaningful architectural move for serving large language models at scale, especially when demand is uneven or latency-sensitive.

NVIDIA NIXL Communication Library Integration

The use of NVIDIA's NIXL communication library is a central part of this setup. It enables communication efficiency across distributed GPU resources, which is essential when inference workloads are being split into separate stages. In a large cloud environment, communication overhead can easily become the bottleneck, so this integration directly supports the lower-latency goal.

Elastic Fabric Adapter Supports Distributed GPU Resources

AWS combines NIXL with Elastic Fabric Adapter to support distributed GPU inference. Elastic Fabric Adapter is the AWS component helping these resources communicate across the infrastructure. When you're trying to serve large language models with lower latency, the network path is just as important as the GPU itself. This collaboration reflects that reality.

AWS and NVIDIA Expand Into In-Vehicle AI Assistants

The partnership goes well beyond cloud infrastructure. AWS and NVIDIA are also collaborating on in-vehicle AI assistants that combine Amazon's Alexa Custom Assistant with the NVIDIA DRIVE AGX automotive computing platform.

This opens up a very different AI use case, one that lives partly at the edge and partly in the cloud. And honestly, that's where a lot of the next wave is heading. Not just giant centralized models, but systems that need to work in real time, close to the user, while still tapping cloud capabilities when needed.

Alexa Custom Assistant and NVIDIA DRIVE AGX Integration

The planned in-vehicle system combines Amazon's Alexa Custom Assistant with NVIDIA DRIVE AGX. This pairing is designed to give automakers an AI assistant platform that can operate inside the vehicle while drawing on a broader cloud-connected ecosystem. It blends voice interaction, automotive computing, and connected services into a single experience layer.

Edge Computing in Vehicles for Real-Time Requests

The technology is intended to let automakers process requests directly in the vehicle using edge computing. That local processing approach matters for responsiveness and in-car usability, especially when drivers or passengers expect immediate interaction. Handling requests on the vehicle itself can reduce dependence on constant cloud round trips.

Cloud Connectivity for Music Streaming and Smart Home Control

Alongside edge processing, the system will connect to cloud-based capabilities for tasks such as streaming music and smart home control. That hybrid model gives the assistant broader functionality without placing every task entirely on in-vehicle hardware. It's a practical design for connected vehicles, where some functions need instant local response and others benefit from cloud integration.

Automaker Evaluation Timeline Set for Early 2027

The in-vehicle AI assistant system is planned to be available for automaker evaluation in early 2027. That gives manufacturers a future window to assess how the combined AWS and NVIDIA platform might fit into next-generation vehicle experiences.

15 Years of AWS and NVIDIA Collaboration Shape the Current Expansion

The latest announcement builds on more than 15 years of collaboration between AWS and NVIDIA. That history gives the current expansion more weight. This isn't a brand-new alliance chasing headlines. It's a long-running technical relationship that has steadily widened from cloud infrastructure into custom silicon integration, data platforms, and automotive AI.

NVIDIA NVLink Fusion Integration With AWS Custom Silicon

At AWS re:Invent in December, the companies announced that AWS would integrate NVIDIA NVLink Fusion into its custom silicon, including Trainium4 chips. That detail shows the partnership is extending into deeper hardware-level interoperability. When a cloud provider's custom silicon strategy starts connecting with a chipmaker's interconnect technology, the relationship has moved well beyond standard vendor supply.

Trainium4 and Broader AI Compute Strategy

Including Trainium4 in that integration points to a broader AWS AI compute strategy that doesn't rely on a single hardware path. AWS is scaling NVIDIA infrastructure while also aligning parts of its own custom silicon roadmap with NVIDIA technology. That kind of dual-track approach gives AWS more flexibility across training, inference, and specialized compute environments.

NVIDIA Sees Extraordinary AI Demand Through 2027

NVIDIA CEO Jensen Huang described the current moment as one of extraordinary demand and said he sees at least $1 trillion in NVIDIA revenue from 2025 through 2027. Within the context of AWS deploying over 1 million NVIDIA GPUs, that statement reinforces just how large both companies believe the AI infrastructure market is becoming.