A proof of concept forgives a fragile data path. Operational AI does not.

AdobeStock 635481072
Presented by F5


As enterprises move AI workloads from pilot to production, data delivery often becomes the factor that determines whether those systems can reliably scale. Point-to-point architectures that directly connect storage to compute hold up under performance conditions, but they often break down under sustained, concurrent production traffic. This results in clogged estimation pipelines, delayed RAG systems, underutilized GPUs, and SLA violations, all of which have direct business consequences.

"Organizations operate AI successfully when their infrastructure is built to handle real-world failures, not just controlled situations." says Hunter Smit, senior manager of product marketing at F5.

Production traffic exposes architectural vulnerabilities

In a pilot, a stalled transfer is an inconvenience, while in production, that same stall is an outage that someone now owns. The underlying architecture is often the same in both cases: when a client is directly connected to the storage, the system becomes increasingly fragile under sustained, concurrent production traffic because there is no response to that direct connection if a node fails or traffic spikes occur. From there, retries and timeouts cascade, and the entire pipeline comes back up exactly when the business is relying on output.

"Point-to-point architecture, where the S3 client connects directly to S3 storage, is not as flexible," says Paul Pindell, principal solutions architect for technology alliances at F5. "If even a single storage node fails, all traffic in that cluster is degraded, and in some cases the cluster may fail completely."

The problem is that AI workflows, including RAG-based inference and agentic AI, increasingly treat S3 storage as a first-class citizen in an AI cluster. However, the network connectivity between that storage and the cluster was never designed for the high-throughput, seamless data movement that is required to keep GPUs running optimally.

The real cost of clogged pipelines and underutilized GPUs

"Enterprise leaders design AI infrastructure around GPU usage, but what makes AI different from traditional deterministic workloads is that the infrastructure continuously influences those outcomes in every interaction." says Tanu Mutreja, senior director of product management at F5. "In an AI environment, infrastructure is no longer just a back-end concern. It shapes customer experience, quality, flexibility and cost with every transaction."

This can have significant business consequences. For example, when the estimation pipeline grinds to a halt, it becomes an SLA and customer experience issue. When RAG systems are delayed, models lose access to timely, relevant context, resulting in inaccurate, outdated or confused responses, all of which create operational, compliance and reputational risks. Additionally, the infrastructure issues that create those problems can also increase costs by leaving expensive GPU resources idle or underutilized.

"When GPUs are underutilized, it indicates infrastructure inefficiencies that increase costs while limiting scalability and responsiveness," Mutreja says. "The leadership question is whether an end-to-end AI infrastructure consistently delivers reliable, secure, high-quality, and governed AI experiences at sustainable unit economics."

Building a production-ready data delivery layer

F5 treats data delivery as a first-class infrastructure layer, rather than assuming that the network path will simply work. Where application delivery optimized the flow of requests between users and applications, data delivery optimized the flow of data between storage, networks, and compute, including AI compute.

Making the data distribution a first-class layer means building it into three properties:

Observability provides real-time visibility into latency, throughput, and flow health.

Programmability enables policy-driven control over how data moves through dynamic routing, traffic optimization, rate management, and automatic failover.

Failure-awareness builds resilience to degraded networks, storage throttling, and service disruptions.

In the architecture F5 developed for Dell ObjectScale, the F5 BIG-IP storage sits between ObjectScale and AI compute as a programmable control point at the edge.

"We have seen cases where misconfiguration in the AI ​​compute layer effectively DDoS the S3 storage infrastructure, " Pindell says. "Not in a malicious way, but in a ‘Oh no, what did I do?’ Momentarily, but it still reduced storage for the entire organization."

Placing BIG-IP as the application delivery controller between the storage and compute layers protects the storage with QoS, rate limits, and connection limits, keeping it resilient and operational under that kind of load. Pindell says SecureIQLab-validated testing has confirmed that this security doesn’t come at the expense of throughput, which matters architecturally.

"It is necessary to preserve and even improve throughput," He explains. "It’s what gives you high-level functionality, flexibility, and enhanced security without sacrificing performance to get there."

The added complexity of hybrid and multicloud AI

The challenge of data distribution is even greater due to the diversity in AI deployments in hybrid multicloud environments. In other words, data moving across these environments must contend with inconsistent policies, security controls, identity systems, governance requirements, fragmented visibility, and specific failure boundaries.

Programmable traffic management and observability address this complexity together. Observability provides a unified view of the health of applications, networks, and infrastructure across otherwise disjointed environments. Programmable traffic management uses those insights to intelligently route, balance, and failover traffic in real time. Together, they create a closed-loop feedback system that enforces consistent policies, improves resiliency across failure domains, and ensures reliable, high-performance AI data delivery, no matter where the applications, data, or users reside.

What separates production AI from permanent pilots

Smit says organizations that move forward with sustainability pilots share a specific engineering discipline.

"They are the ones who approach production design with failure as the normal state, not the exception." He explains. "They will assume that there will be latency, congestion, and partial outages. And they build a data path that is observable and failure-aware enough to absorb them, with explicit mitigations for every worst-case scenario rather than just hoping the network holds on."

Organizations stuck in ongoing pilots are still optimizing for perfect lab results and discovering real-world differences only when workloads become active. The issue is not model quality or GPU computation, but rather whether the data delivery layer was engineered with the same rigor as the computation.

"Teams need to understand that a real-world network behaves very differently from an optimized laboratory network," Pindell says. "They need a mitigation plan for failure scenarios and performance disruptions that may occur in production."


Sponsored articles are content produced by a company that is either paying for the post or that has a business relationship with VentureBeat, and they are always clearly marked. Contact for more information sales@venturebeat.com.



<a href

Leave a Comment