Context architecture is replacing RAG as agentic AI pushes enterprise retrieval to its limits

context layer smk1
Redis made its name as a caching layer that kept web applications from collapsing under load. The problem it’s targeting now has a similar structure but is harder to solve: Production AI agents are failing not because the models are wrong, but because the data beneath them is scattered, outdated, and structured for humans rather than machines. Recovery pipelines built for single queries cannot absorb the volume generated by agents.

The gap Redis is targeting is structural: agents make orders of magnitude more data requests than human users, but most recovery layers were built for the human-scale problem. Redis Iris, launched on Monday, is the company’s answer: a context and memory platform that sits between an agent and the data it needs to function. The platform combines real-time data ingestion, a semantic interface that automatically generates MCP tools from business data models, and an agent memory server built on Redis Flex, a rewriteable storage engine that runs 99% of data on flash at only one-tenth the cost of in-memory storage.

The enterprise has been announced as a proactive transformation of the RAG infrastructure. VentureBeat’s Q1 2026 VB Pulse RAG Infrastructure Market Tracker found buyer intent to embrace hybrid recovery tripling from 10.3% to 33.3% between January and March. Recovery optimization overtook valuation as the top enterprise investment priority for the first time. Custom in-house recovery stacks grew from 24.1% to 35.6% as enterprises left behind off-the-shelf options. Redis isn’t the only infrastructure vendor reading those signals — several data platform providers have made reorientations around agent context layers in recent weeks.

Scale mismatch is the structural logic behind launch.

"Companies will have more orders from agents than humans," Redis CEO Rowan Trollope told venturebeat. "Orders of magnitude greater than that of humans means an order of magnitude greater load on the back end system."

from cache to reference

Trollope traces the parallel to the mobile era: When the legacy backend built for Branch Teller suddenly had to serve a million smartphone users, Redis became the caching layer that absorbed the load without a complete rebuild.

What’s different this time is that agents can’t write their own middleware. In the mobile age, a developer will sit down with a database administrator, identify the queries needed for an application and hard-code the caching logic into the middleware layer. Agents can’t do this. They need to find the right data, at runtime, through interfaces already built for them, or they stop.

"It’s like the grocery store analogy in the fridge," He said. "If you have to go to the grocery store to get food every time you want to make your sandwich, that’s not very efficient. You keep a fridge in every house, keep some food in it. And that’s where we still are in the infrastructure pile."

What is included in Radis Iris

Iris ships five components that together cover data ingestion, semantic access, memory, and caching.

Redis data integration. Now in general availability. RDI uses change data capture pipelines with connectors for Oracle, Snowflake, Databricks, and Postgres to continuously sync data from relational databases, warehouses, and document stores to Redis.

Context recovery. Now in preview. Developers define the semantic model of business data using the PyDentic model and use Redis auto-generated MCP tool agents to query it directly, with row-level access controls enforced server-side. Trollope describes the change from the classic RAG as a directional inversion. "It’s just an inversion that the agent pulls the data instead of pre-processing it and feeding it into the pipeline." He said.

agent memory. Now in preview. Stores short- and long-term state across sessions so agents can reference it without retrieving it each turn.

redis flex. A rewritten storage engine that runs 99% of data on SSD and 1% in RAM, providing petabyte-scale retrieval at sub-millisecond latency.

Redis Search and Langcache. Retrieval and semantic caching backbone beneath the platform. LangCache reduces unnecessary model calls by caching quick responses.

What do analysts say?

The data industry is now generally moving in the same direction. Every major database vendor is creating a context layer logic.

traditional database vendors including oracle Relational databases are integrating contextual and memory layers to bring them into the agentic AI era. Including purpose-built vector database vendors pine nuts Agentic AI is doing just that, creating a new knowledge layer for context. Like standalone context layers massa Are also part of the emerging scenario.

Trollope described Redis’s position as structurally different from that of the competition.

"For us to win, no one else has to lose," He said. Many Redis deployments already run MongoDB or Oracle as the backend system of record. Iris mirrors and caches those systems rather than displacing them. Redis is launching Iris in the Snowflake Marketplace with native connectors.

Stephanie Walter, practice leader of the AI ​​stack at Hyperframe Research, puts the market context clearly. "The market is reaching the same conclusion: agents don’t just need more tokens or better models. They need a governed, current, low-latency context," Walter said.

His study on the differentiation of Redis focuses on where Redis already sits in the stack, which is closer to runtime, latency-sensitive operational state, and real-time data.

"The pitch isn’t ‘better RAG’, because ‘agents need live context, memory, and fast retrieval when actually working," He said.

Whether it’s Redis or another vendor, every context layer technology must face the challenge of governance to succeed.

"If each agent becomes a new cost center, a new data access risk, and a new governance exception, agent AI will not scale in the enterprise." He said. "The winning context layers will be those that make agents faster, cheaper, and safer to run."

For real-time clinical AI, getting the context wrong is not an option

Mangoes.ai is one company that has already had to answer those questions in production, in situations where the cost of getting the context wrong is measured in patient outcomes.

Mangoes.ai Founder and CEO Amit Lamba runs a real-time voice AI platform deployed in large healthcare facilities, where patients and physicians ask live questions about treatments, scheduling and case histories. Mangoes.ai built its stack natively on Redis from the start.

"Retrieval, memory, and session state all run through Redis, so we’re not tying together different tools and expecting them to talk to each other," Lamba said.

The problem is the dynamic memory capacity of Iris is known which occurs in a complex session.

"Think about a one-hour group therapy session," Lamba said. "You need to know who said what, when, and be able to give the correct information to the physician at that time. This is not a simple recovery problem."

The platform runs multiple specialized agents in parallel, one for entity identification, one for relation reasoning and one for integrating case histories.

"Dynamic memory capacity maps almost perfectly to the problem we are solving," Lamba said.

What does this mean for enterprises

For enterprises that have built their AI stack around RAGs, the recovery layer that took them to production is no longer enough to keep them there

The RAG era is paving the way for reference architecture. The classic RAG model pushed data to the agent before calling the model. Production deployments are flipped: agents pull whatever they want at runtime via tool calls, treating the data layer as a live resource rather than a preloaded payload. Teams optimizing RAG pipelines are still solving last year’s problem.

The semantic layer is now the production infrastructure. The model that defines business entities, their relationships, and the access rules between them should be created, versioned, and maintained with the same discipline as a data pipeline. Most organizations do not have the staff or structure for that work. Enterprises that define their reference architecture now won’t have to rebuild it as agent workloads grow.

The budget is already underway. VB Pulse Q1 2026 data shows recovery optimization investment increased from 19% to 28.9% over the quarter, outpacing valuation spending for the first time. Organizations that spent last year measuring their recovery quality are now spending it getting it right. The context layer is an active purchase decision, not a roadmap item.

"The buyer’s first question should not be ‘Do I need a vector database, long references, memory, or a reference engine?’ It should be ‘What does this agent need to know, how fresh does that knowledge need to be, who is allowed to access it, and what is the cost of each retrieval?’" Walter said.



<a href

Leave a Comment