AI's GPU problem is actually a data delivery problem

AdobeStock 1641837429
Presented by F5


As enterprises invest billions of dollars in GPU infrastructure for AI workloads, many are finding that their expensive compute resources are sitting idle far more than expected. The culprit is not the hardware. It’s this often invisible data delivery layer between storage and compute that starves the GPU of the information it needs.

"While people are focusing their attention, rightfully, on GPUs, as they are such a significant investment, they are rarely the limiting factor," says Mark Mentzer, F5’s solutions architect. "They are able to do more work. They are waiting for the data."

AI performance increasingly depends on an independent, programmable control point between the AI ​​framework and object storage – which most enterprises have not intentionally designed. As AI workloads grow, bottlenecks and instability occur when AI frameworks are tightly coupled to specific storage endpoints during scaling events, failures, and cloud transitions.

"Traditional storage access patterns were not designed for highly parallel, explosive, multi-consumer AI workloads," says Maggie Stringfellow, VP, Product Management – ​​BIG-IP. "Efficient AI data movement requires a separate data delivery layer designed to abstract, optimize, and secure data flows independently of storage systems, as GPU economics make inefficiencies immediately visible and costly."

Why does AI workload affect object storage?

These bidirectional patterns include continuous data capture, simulation output, and massive ingestion from model checkpoints. Combined with read-intensive training and inference workloads, they stress the tightly coupled infrastructure on which storage systems depend.

While storage vendors have done significant work in scaling data throughput in and out of their systems, focusing only on throughput impacts the switching, traffic management, and security layers associated with storage.

The stress on S3-compatible systems from AI workloads is multifaceted and differs significantly from traditional application patterns. It’s less about raw throughput and more about concurrency, metadata pressure, and fan-out considerations. Training and fine-tuning make patterns particularly challenging, such as massively parallel reading of small to medium-sized objects. These workloads involve repeated passes of training data across epochs and periodic checkpoint write bursts.

RAG workloads introduce their complexity through request amplification. A single request can span dozens or hundreds of additional data segments, which can turn into further details, related segments, and more complex documents. Stress concentration is less about capacity, storage system speed and more about request management and traffic shaping.

Risks of tightly linking AI frameworks to storage

When AI frameworks connect directly to storage endpoints without an intermediate delivery layer, operational vulnerabilities increase exponentially during scaling events, failures, and cloud transitions, which can have major consequences.

"Any instability in the storage service now has an uncontrolled blast radius," Menger says. "Anything here becomes a system failure, not a storage failure. Or clearly, abnormal behavior in one application can have an adverse impact on all consumers of that storage service."

Menger describes a pattern he has observed with three different customers, where tight coupling results in complete system failures.

"We see that large-scale training or fine-tuning workloads overwhelm the storage infrastructure, and the storage infrastructure goes down," He explains. "On that scale, recovery is never measured in seconds. Minutes if you’re lucky. Usually hours. The GPU is no longer being fed. They are hungry for data. These high-value resources, for the entire time the system is down, have a negative ROI."

How an independent data delivery layer improves GPU utilization and stability

The financial impact of introducing an independent data delivery layer extends far beyond preventing catastrophic failures.

Stringfellow says decoupling allows data access to be optimized independently of storage hardware, improving GPU utilization by reducing idle time and contention, while improving cost predictability and system performance as scale increases.

"It enables intelligent caching, traffic shaping and protocol optimization closer to the compute, which reduces cloud egress and storage amplification costs." she explains. "Operationally, this isolation protects storage systems from asynchronous AI access patterns, resulting in more predictable cost behavior and stable performance under growth and variability."

Using a Programmable Control Point between Compute and Storage

F5’s answer is to position its application delivery and security platform powered by BIG-IP as a "storage front door" Which provides health-aware routing, hotspot avoidance, policy enforcement, and security controls without the need to rewrite the application.

"Introducing a distribution layer between compute and storage helps define the boundaries of accountability," Menger says. "Computation is about execution. Storage is about durability. Delivery is about reliability."

Programmable control points, which use event-based, conditional logic rather than generative AI, enable intelligent traffic management that goes beyond simple load balancing. Routing decisions are based on actual backend health, using intelligent health awareness to detect early signs of trouble. This includes monitoring key indicators of trouble. And when problems emerge, the system can isolate misbehaving components without shutting down the entire service.

"An independent, programmable data delivery layer becomes essential because it allows policy, optimization, security, and traffic control to be applied equally to both ingestion and consumption paths without modifying storage systems or AI frameworks." Stringfellow says. "By separating data access from storage implementation, organizations can safely absorb burst writes, optimize reads, and protect backend systems from atypical AI access patterns."

Handling security issues in AI data distribution

Stringfellow says AI is not only stressing storage teams on throughput, but it’s forcing them to consider data movement as both a performance and security issue. Security can no longer be assumed simply because the data resides deep in the data center. AI introduces automated, high-volume access patterns that must be authenticated, encrypted, and controlled at speed. This is where the F5 BIG-IP comes in handy.

"The F5 BIG-IP sits directly in the AI ​​data path to provide high-throughput access to object storage while enforcing policy, inspecting traffic, and making payload-informed traffic management decisions." Stringfellow says. "Feeding the GPU quickly is necessary, but not sufficient; Storage teams now need confidence that AI data flows are optimized, controlled, and secure."

Why will data delivery define AI scalability?

Looking ahead, Stringfellow says the requirements for data delivery will only intensify.

"AI data delivery will shift from bulk customization to real-time, policy-driven data orchestration in distributed systems," She says. "Agentic and RAG-based architectures will require fine-grained runtime control over latency, access scope, and delegated trust boundaries. Enterprises must begin to treat data delivery as a programmable infrastructure, not a byproduct of storage or networking. Organizations that do this early will move forward faster and with less risk."


Sponsored articles are content produced by a company that is either paying for the post or that has a business relationship with VentureBeat, and they are always clearly marked. Contact for more information sales@venturebeat.com.



<a href

Leave a Comment