Nvidia BlueField-4 STX adds a context memory layer to storage to close the agentic AI throughput gap

storage ai context smk1
When an AI agent loses context mid-task because traditional storage can’t keep up with inference, it’s not a model problem – it’s a storage problem. At GTC 2026, Nvidia announced Bluefield-4 STX, a modular reference architecture that puts a dedicated reference memory layer between the GPU and traditional storage, claiming 5x the token throughput, 4x the energy efficiency, and 2x the data ingestion speed of traditional CPU-based storage.

The constraint STX target is the key-value cache data. The KV cache is a stored record of what a model has already processed – the intermediate computations an LLM saves so that it does not need to pay attention to the entire context at each inference step. This is what allows an agent to maintain consistent working memory across sessions, tool calls, and reasoning steps. As the context window grows and agents take more steps, the cache grows with them. When it has to traverse the traditional storage path to get back to the GPU, inference becomes slower and GPU utilization is reduced.

STX is not a product that Nvidia sells directly. It is a reference architecture that the company is distributing across its storage partner ecosystem so that vendors can build AI-native infrastructure around it.

STX puts a reference memory layer between the GPU and the disk

The architecture is built around a new storage-optimized Bluefield-4 processor that combines Nvidia’s Vera CPU with a ConnectX-9 SuperNIC. It runs on Spectrum-X Ethernet networking and can be programmed through Nvidia’s DOCA software platform.

The first rack-scale implementation is the Nvidia CMX reference memory storage platform. CMX extends GPU memory with a high-performance context layer specifically designed to store and retrieve KV cache data generated by large language models during inference. Keeping that cache accessible without a round trip through general purpose storage is exactly what CMX is designed for.

"Traditional data centers provide high-capacity, general-purpose storage, but typically lack the responsiveness needed to interact with AI agents who need to work across multiple stages, devices, and different sessions." Ian Buck, Nvidia’s vice president of hyperscale and high-performance computing, said in a briefing with press and analysts.

In response to a question from VentureBeat, Buck confirmed that STX also comes with a software reference platform to accompany the hardware architecture. Nvidia is expanding DOCA to include a new component referred to in the briefing as DOCA memos.

"Our storage providers can take advantage of the programmability of the Bluefield-4 processor to optimize storage for the Agentic AI Factory." Buck said. "In addition to a reference rack architecture, we are also providing a reference software platform to deliver those innovations and customizations to their customers."

Storage partners building on STX get both a hardware reference design and a software reference platform – a programmable foundation for context-optimized storage.

Nvidia’s partner list includes storage incumbents and AI-native cloud providers

Storage providers that have co-designed STX-based infrastructure include Cloudian, DDN, Dell Technologies, Everpure, Hitachi Vantara, Apache, IBM, Minio, NetApp, Nutanix, VAST Data, and Weka. Manufacturing partners making STX-based systems include AIC, Supermicro, and Quanta Cloud Technologies.

On the cloud and AI side of things, Corewave, Crusoe, IREN, Lambda, Mistral AI, Nebius, Oracle Cloud Infrastructure, and Vulture have all committed to STX for context memory storage.

That combination of enterprise storage incumbents and AI-native cloud providers is a sign worth watching. Nvidia is not positioning STX as an exclusive product for hyperscalers. This is establishing it as the reference standard for anyone building storage infrastructure that has to serve agentic AI workloads – which, within the next two to three years, is likely to include the majority of enterprise AI deployments running large-scale multi-step inference.

STX-based platforms will be available from partners in the second half of 2026.

IBM shows what a data layer problem looks like in production

IBM sits on either side of the STX announcement. It is listed as the storage provider co-designing the STX-based infrastructure, and Nvidia separately confirmed that it has chosen the IBM Storage Scale System 6000 – certified and validated on Nvidia DGX platforms – as the high-performance storage foundation for its own GPU-native analytics infrastructure.

IBM also announced significantly expanded collaboration with Nvidia at GTC, including GPU-accelerated integration between IBM’s WatsonX.Data Presto SQL engine and Nvidia’s CUDF library. A production proof of concept with Nestlé shows what the acceleration looks like: Data refresh cycles in the company’s order-to-cash data mart, covering 186 countries and 44 tables, were reduced from 15 minutes to three minutes. IBM reported 83% cost savings and 30 times price-performance improvement.

The result for Nestlé is a structured analysis workload. It does not directly reflect agentic inference performance. But it solidifies IBM and Nvidia’s shared argument: The data layer is where enterprise AI performance is currently constrained, and the GPU-accelerator brings it to physical results in production.

Why is the storage layer becoming the first-class infrastructure decision?

STX is a sign that the storage layer is becoming a first-order concern in enterprise AI infrastructure planning, and not an afterthought in GPU purchases. General purpose NAS and object storage were not designed to serve KV cached data at predictable latency requirements. STX-based systems from partners including Dell, Apache, NetApp and VAST Data are being put forward by Nvidia as a viable alternative, with the DOCA software platform providing a programmability layer to tune storage behavior for specific agentive workloads.

Performance claims – 5x token throughput, 4x energy efficiency, 2x data ingestion – are measured against traditional CPU-based storage architectures. Nvidia hasn’t specified the exact baseline configuration for those comparisons. Before these numbers drive infrastructure decisions, it’s worth setting a baseline.

The platform is expected to launch with partners in the second half of 2026. Given that most major storage vendors are already co-designing on STX, enterprises evaluating a storage refresh for AI infrastructure in the next 12 months should expect to see STX-based options available from their existing vendor relationships.



<a href

Leave a Comment