Enterprise AI agents keep failing because they forget what they learned

u7277289442 An AI agent is sitting at a desk. The desk is cov 75a9078a ea3d 421e bf5d 23a650aab330 0
RAG architectures are good at one thing: surfacing semantically relevant documents. They stop here also.

A framework called decision context graphs addresses that gap by giving agents structured memory, time-aware reasoning, and explicit decision logic. Rippletide, a startup in the Neo4j ecosystem, has created one. Core capability: Agents that are non-regressive, able to stabilize valid sequences of actions over time and combine them.

“The key point you want is non-regressibility: how do you ensure that, when the agent generates something new, you can compromise on previous discoveries?” said Yann Billien, co-founder and chief scientific officer of RepliTid.

Why doesn’t RAG go far enough?

Enterprise context spans ERP tools, logs, databases, vector stores, and policy documents. Generative AI tools can retrieve from all of these – through keyword searches, SQL queries, or full RAG pipelines – but there is a limit to the retrieval.

In particular, the data retrieved may not be relevant to the current decision (thus causing hallucinations); And, even if agents extract the right data, they often lack guidance to make decisions supported by strong logic.

That is, RAG retrieves documents, not decision contexts. “Everyone starts with RAG: pull relevant documents, fill them in at the prompt, let the model figure it out,” said Wyatt Mayham of Northwest AI Consulting.

While this works fine for chatbots, it “instantly breaks” for agents who need to make decisions and take action, he pointed out. “The biggest thing builders struggle with is the gap between retrievability and usability.”

Mehmed said the retrieved document does not tell the agent whether it still applies, whether it has been removed, or whether there is a conflicting rule that takes precedence. “Agents need decision context, not just information.”

In manufacturing (the human world), this might mean knowing that a pricing exception has expired, a security policy only applies in certain jurisdictions, or a standard operating procedure was updated a month ago. “Any of this gets missed, and the agent confidently makes a mistake,” Mehmed said.

Without a structured decision context, agents combine inconsistent rules, invent constraints to fill in the gaps, and rely on what the billions say. "Probabilistic inference on unlimited data." Errors are difficult to reproduce because builders cannot figure out why the agent chose a choice.

The problem of compounding error is also real, Mehmed said: A small miss rate per step becomes “catastrophic” in a multi-step workflow. “This is the main reason why most enterprise agents never leave the pilot stage.”

How do decision context graphs reach relevant answers?

A decision context graph solves this by encoding a structured map of what is applicable, what the rules are, and when they are applied.

The framework is adapted to one question: "Given this situation, which context currently applies?" Time is treated as a first-order dimension; The scope of each rule, decision and exception applies only if it is legal.

“The goal is to explicitly address missing, inconsistent, or contradictory data when creating graphs to avoid probability [errors] Once the agent is underway,” Billien said.

This system is built around three principles:

  • Applicability: The logic is explicitly encoded so that the agent knows which rules to remember and apply in a given situation. The context is returned only if it is relevant to the situation.

  • Time-aware memory: Every rule, decision and exception is time-limited. This gives agents a chance to reason "What was true then versus what is true now," Then restate or explain your decisions.

  • Decision Path: The system can explain how it got from A to B and "Why" The rationale behind it (for example, why one piece of reference was included and not another). agents are given "decision path" Examples of how similar cases were handled in the past.

At setup, unstructured data is ingested and structured into an ontology: what entities exist, what rules apply, what counts as an exception. Neuro-symbolic AI handles pattern recognition and encodes formal, machine-readable logic. Over time, as new decisions are made, the system refines its knowledge base.

“Neuro-symbolic brings together two parts: a neuronal part to give agents greater autonomy and a symbolic part to reduce the number of data needed and bring control,” Billion said.

The agent is tested at pre-production to validate its behavior or correct improvements. He said that this reduces the risk as well as the need for calculations while estimating.

Agents are learning instead of regressing

When it comes to non-regression, the key is to combine both intelligence (the model) and knowledge (shared between agents), said Billien. It is important that agents can detect; When they don’t know how to complete a task, they may try different possibilities, usually in a controlled environment or simulation (like a support bot trying multiple response patterns).

Then, “once a solution is deemed satisfactory, the graph stops that sequence of actions,” Billien said. Future exploration starts from this “stable base of valid behavior” to prevent newly acquired skills from overwriting previously learned good behavior.

Before an agent takes action or influences a customer, it checks against the graph: is it violating any rules? Hallucinations? Staying within limits? Can this generalize the solution to similar cases?

At a broader level, the system measures outcomes: Did the behavior improve long-term performance? Did it generalize to similar contexts? Did it preserve previous capabilities?

“This determinism is important for driving reliability at scale for agents,” Billien said. This leads to behavior that is more consistent, predictable, explainable and allows for stronger controls and audits.

“You want your agents to be able to learn on their own when they encounter something they don’t know about,” he said. “You want them to be able to explore and find new solutions.”

go beyond "Relevant" Memory

While the team initially assumed it would deploy RL everywhere, "This actually proved very difficult in an enterprise setting," Bileen said. "Data is sparse for some specific use cases and messy for others."

Typically, using raw data to make reliable predictions has been a manual and time-consuming challenge, but “now with agents we have entered a new era where building ontologies is possible automatically,” Billion said.

Classic supervised fine-tuning methods can lead to oscillations, when the model forgets the last skill learned when learning the next tone. Overall, learning is not complex, compression is “dramatic”, and models improve “contextually” rather than continuously, causing them to consistently fail on new or unseen tasks.

As Billien said: “You’ll never have a fully self-learning model if you’re retraining every time.”

He said that in enterprise use cases – such as banking where millions of transactions are processed a day – a high level of reliability is critical. “I ask all customers a question: Is 95% enough? In many use cases, it’s not. You need 99.999%. 1% off is too much.”

Decision context graphs can close that gap, they argue: When the same customer support question is asked repeatedly, the agent will give a “satisfactory” answer, predictably and without any regression, while maintaining autonomy.

Encoding usability and temporal validity in a structured graph – rather than relying on LLM to estimate it – is a "sound approach" That’s a real limitation in the existing recovery framework, Mehmed said. The open question is whether automated ontology generation can really cope with the disorganized, diverse data that enterprises possess. "That’s always the hard part," He said.



<a href

Leave a Comment