
As enterprise AI agents move into production, organizations are facing increasing reliability issues. Many teams are finding that LLM performance alone does not determine whether agents are successful in production. Long-running AI workflows must avoid crashes, preserve state, recover from failures, manage forecast costs, and coordinate across APIs, tools, and enterprise systems.
After the first wave focused on rapid deployment, organizations now need to rethink those first-generation implementations, and redesign early agent architectures around workflow orchestration, observability, governance, and recovery, said Preeti Somal, senior VP engineering at Temporal Technologies, during the latest AI Impact Series event in New York.
“We have a lot of customers who come to us where they are building version 2.0 of the same agent,” Somal said. “They had to move really fast, but they didn’t take care of the pipeline. Things crash and burn, and then they come back to rebuild with reliable foundations.”
For workflow orchestration company Temporal, whose infrastructure predates the current wave of agentic AI, the shift reflects a broader enterprise realization: Production AI systems need sustainable execution, state management, visibility into workflows, and mechanisms to recover if models or downstream systems fail.
Agent AI supercharges familiar engineering problems
“These patterns are not necessarily new," Somal said. " AI supercharges them."
Agent systems introduce additional complexity because they often involve long-running, multi-step processes involving multiple services, models, APIs, and tools. A single workflow can call multiple large language models, access recovery systems, trigger external applications, and manage state over hours or days. Somal said, engineering questions often come up only after deployment.
“People would write agents but not think about what would happen if the agent crashed,” he said. “Will I need to re-run the entire agent flow?”
For enterprises operating under cost constraints, the answer matters. Restarting workflows after failures can increase estimated costs, increase latency, and lead to poor customer experiences.
Somal compared the current moment to an earlier era of enterprise cloud adoption, when organizations moved straight to migrating workloads before considering that they needed to redesign the underlying architecture if they wanted to withstand these workloads in the long term.
“This rush to do AI in a world where you haven’t even modernized your applications reminds me somewhat of the lift-and-shift that’s happening in the cloud,” he said. “Everyone realized you’re spending more money on the cloud and we don’t find any value there.”
Why long-running agents force a new architecture
Enterprise workflows are increasingly involving agents executing over long windows, sometimes taking several hours, while interacting with tools and systems. Reliability challenges increase when workflows persist over time, and this impacts both state and memory, two ideas that are often used interchangeably in AI conversations.
State is related to workflow execution. This includes where an agent is in a process, what tasks have already been completed, and where recovery should resume after a failure. Memory or context captures information that the agent carries forward in interactions or actions.
“The state of the agent depends on what step has been taken and what action has been taken, and if something crashes, where you want to recover versus the context and memory part,” Somal explained.
This distinction becomes more significant as enterprises begin to move beyond simple chatbot interactions toward longer-running business processes. Somal pointed to a health care example involving customer Abridge, where the workflow processes physician visits through several steps, including audio processing, summarization, model call, and post-visit generation.
“There is not just one piece in that flow,” Somal said. “Taking the video and cutting it, taking the summary, calling the LLM, preparing the post-trip summary, all this is being organised.”
The implication for enterprises is that successful agents increasingly depend on systems that can avoid interruptions, coordinate services, and maintain continuity over time.
rise of destiny spine
A useful framework for enterprise AI design is the deterministic backbone, Somal said, which is how he thinks about the role of temporal.
“It shows the path you want to take," He said. "It is calling the brain, but if the brain does not respond, it will call it again. If the brain responds but the next step is going to fail, it will start where that failure occurred.
In this framing, the language model acts as a probabilistic system generating variable outputs, while the orchestration software around it maintains execution reliability. And this concept matters because enterprise systems require continuity even when the model remains non-deterministic. A procurement workflow, healthcare summary, customer support escalation, or compliance process cannot silently fail simply because a model call timed out or an external dependency crashed.
“The thing you care about most is making sure you can recover and that you’re not paying a token tax if something goes wrong,” Somal said.
Credibility, visibility and economics of token spending
As enterprise leaders evaluate AI ROI, cost visibility has become a growing concern. Long-running agents often make multiple model calls in complex workflows, which can create opaque spend patterns. Somal described one operational benefit of orchestration as visibility into where costs accumulate. Because workflows are step-by-step observable, teams can see where tokens are being consumed in the agent process.
“You’ve got visibility of that entire flow in a single pane of glass,” she said. “Now you can see where you are spending tokens in an agent that is in multiple steps and calling multiple different systems.”
Workflow recovery also shapes cost efficiency. Without durable orchestration, a late-stage failure can force organizations to re-run the entire process from the beginning, including all prior model calls. Somal said systems designed for recovery can resume execution from the point of interruption.
“You start where the accident happened,” he said. “We save you the cost of running an agent from the first step.”
Enterprises need to pave the way and enlist partner expertise
Governance concerns are another emerging pattern as agentic AI takes hold. Somal said that rather than wholesale adoption of fully managed agent systems, enterprises increasingly want standardized internal frameworks that provide guardrails while maintaining flexibility, and implement essential features such as governance controls, model selection policies, detection systems, cost management and observability.
“Enterprises are looking at building these paved roads,” he said. “Taking something off the shelf probably won’t work because there are all these other needs out there.”
As organizations rethink first-generation deployments, such challenges look less like a model problem and more like a systems engineering problem, and Temporal is positioned to help enterprises take this next step because for many organizations, it was already present as part of broader modernization programs before AI became a strategic priority.
“Temporal is already in the enterprise,” Somal said. “It feels very natural to take that and extend it to AI and agent platforms.”
<a href