How Google’s TPUs are reshaping the economics of large-scale AI

cruey3 photo realistic colorful blocks going up from the ground 3055ca54 93bb 4e1b a45d d53a8a0d193e
For more than a decade, Nvidia’s GPUs have underpinned nearly every major advance in modern AI. That position is now being challenged.

Frontier models like Google’s Gemini 3 and Anthropic’s Cloud 4.5 Opus were trained not on Nvidia hardware, but on Google’s latest tensor processing unit, the Ironwood-based TPUv7. This signals that a viable alternative to GPU-centric AI stacks has already arrived – which has real implications for the economics and architecture of frontier-scale training.

Nvidia’s CUDA (Compute Unified Device Architecture), the platform that provides access to the massively parallel architecture of GPUs, and the tools around it have created what many have dubbed it. "CUDA gap"Once a team has built a pipeline on CUDA, it is extremely expensive to switch to another platform due to the dependency on Nvidia’s software stack. That, combined with Nvidia’s first-mover advantage, helped the company achieve an astonishing 75% gross margin.

Unlike GPUs, TPUs were designed from day one as purpose-built silicon for machine learning. With each generation, Google has pushed massive AI acceleration forward, but now, as the hardware behind the two most capable AI models ever trained, TPUv7 signals a broader strategy to challenge Nvidia’s dominance.

GPUs and TPUs both accelerate machine learning, but they reflect different design philosophies: GPUs are general-purpose parallel processors, while TPUs are purpose-built systems optimized almost exclusively for large-scale matrix multiplication. With TPUv7, Google has taken that expertise further by integrating the high-speed interconnect directly into the chip, allowing TPU Pods to scale like a single supercomputer and reducing the cost and latency penalties that typically come with GPU-based clusters.

are tpu "Designed as a complete ‘system’ rather than just a single chip," Val Bercovici, chief AI officer at WEKA, told VentureBeat.

Google’s business pivot ranges from internal to industry-wide

Historically, Google has restricted access to TPUs through cloud rentals only on the Google Cloud Platform. In recent months, Google has begun offering the hardware directly to external customers, effectively separating the chip from its cloud service. Customers can choose between treating computation as an operating expense or incurring a capital expense (buying the hardware outright) by renting through the cloud, removing a major friction point for larger AI labs that prefer to own their own hardware and effectively bypass the cost of computing. "cloud fare" Premium for base hardware.

The centerpiece of Google’s shift in strategy is a landmark deal with Anthropic, where the Cloud 4.5 Opus maker will get access to 1 million TPUV7 chips – more than a gigawatt of compute capacity. About 400,000 chips are being sold directly to Anthropic, through Google’s longtime physical design partner Broadcom. The remaining 600,000 chips are leased through traditional Google Cloud contracts. Anthropic’s commitment adds billions of dollars to Google’s bottom line and locks one of OpenAI’s major competitors into Google’s ecosystem.

is being eroded "CUDA gap"

For years, Nvidia’s GPUs have been the clear market leader in AI infrastructure. In addition to its powerful hardware, Nvidia’s CUDA ecosystem features a vast library of optimized kernels and frameworks. With broad developer familiarity and a huge installed base, enterprises slowly locked into "garbage ditch," A structural constraint that made abandoning GPU-based infrastructure impractically expensive.

One of the major barriers preventing widespread TPU adoption has been ecosystem friction. In the past, TPUs worked best with JAX, Google’s own numerical computing library designed for AI/ML research. However, mainstream AI development primarily relies on PyTorch, an open-source ML framework that can be tuned for CUDA.

Google is now addressing this gap directly. TPUv7 supports native PyTorch integration, including eager execution, full support for distributed APIs, torch.compile, and custom TPU kernel support under PyTorch’s toolchain. PyTorch aims to run as smoothly on TPUs as it does on Nvidia GPUs.

Google is also contributing heavily to vLLM and SGLang, two popular open-source inference frameworks. By optimizing these widely used tools for TPU, Google ensures that developers are able to switch hardware without rewriting their entire codebase.

Advantages and Disadvantages of TPU vs GPU

For enterprises comparing TPUs and GPUs for large-scale ML workloads, the benefits primarily focus on cost, performance, and scalability. SemiAnalysis recently published an in-depth analysis measuring the advantages and disadvantages, cost efficiency as well as technical performance of the two technologies.

Thanks to its unique architecture and greater energy efficiency, TPUv7 provides significantly better throughput per dollar for large-scale training and high-volume inference. This allows enterprises to reduce operating costs related to power, cooling, and data center resources. Semianalysis estimates that, for Google’s internal systems, the total cost of ownership (TCO) for an Ironwood-based server is about 44% lower than the TCO of an equivalent Nvidia GB200 Blackwell server. Even after including the profit margins of both Google and Broadcom, external customers like Anthropic are seeing a ~30% reduction in costs compared to Nvidia. "When cost is paramount, TPUs are useful for large-scale AI projects. With TPU, hyperscalers and AI labs can achieve 30-50% TCO reductions, saving billions." Bercovici said.

This economic leverage is already reshaping the market. Simply the existence of a viable alternative allowed OpenAI to negotiate a ~30% discount on its own Nvidia hardware. OpenAI is one of the largest buyers of Nvidia GPUs, however, earlier this year, the company added Google TPUs through Google Cloud to support its growing computing needs. Meta is also reportedly in advanced discussions to acquire Google TPUs for its data centers.

At this stage, it may seem that Ironwood is the ideal solution for enterprise architecture, but there are a number of compromises. While TPUs excel at typical deep learning workloads, they are much less flexible than GPUs, which can run a wide variety of algorithms, including non-AI tasks. If a new AI technology were invented tomorrow, a GPU would run it instantly. This makes GPUs more suitable for organizations that run a wide range of computational workloads beyond standard deep learning.

Migration from a GPU-centric environment can also be costly and time-consuming, especially for teams with existing CUDA-based pipelines, custom GPU kernels, or those that leverage frameworks not yet optimized for GPUs.

Bercovici recommends companies "When they need to move fast and on time to market matters they opt for GPUs. GPUs leverage standards infrastructure and the largest developer ecosystem, handle dynamic and complex workloads for which TPUs are not optimized, and deploy into existing on-premises standards-based data centers without the need for custom power and networking reconfiguration."

Additionally, the ubiquity of GPUs means that more engineering talent is available. TPUs demand a rare skill set. "Leveraging the power of TPUs requires engineering depth in an organization, which means being able to recruit and retain rare engineering talent who can write custom kernels and optimize compilers," Bercovici said.

In practice, the advantages of Ironwood can be realized mostly for enterprises with large, tensor-heavy workloads. Organizations requiring broader hardware flexibility, hybrid-cloud strategies, or HPC-style versatility may find GPUs a better fit. In many cases, a hybrid approach combining the two can provide the best balance of expertise and flexibility.

Future of AI Architecture

The competition for AI hardware dominance is heating up, but it’s too early to predict a winner – or whether there will be a winner at all. With Nvidia and Google innovating at such a fast pace, and companies like Amazon joining the fray, the highest-performing AI systems of the future may be hybrids, integrating both TPUs and GPUs.

"Google Cloud is experiencing an uptick in demand for both our custom TPUs and Nvidia GPUs,” a Google spokesperson told VentureBeat. “As a result, we are expanding our Nvidia GPU offering to meet substantial customer demand. The reality is that most of our Google Cloud customers use both GPUs and TPUs. With our wide selection of the latest Nvidia GPUs and seven generations of custom TPUs, we offer customers the flexibility of options to optimize for their specific needs."



<a href

Leave a Comment