Anthropic says it solved the long-running AI agent problem with a new multi-session Claude SDK

crimedy7 illustration of robots running a marathon ar 169 98e4a2e9 af27 4fe5 8f24 c70cc6d9dd30 3
Agent memory remains a problem that enterprises want to fix, as agents tend to forget certain instructions or conversations the longer they run.

anthropic Believes it has resolved the issue Cloud Agent SDKDeveloping a two-tier solution that allows an agent to operate across different context windows.

Anthropic wrote, “The main challenge of long-running agents is that they must work in separate sessions, and each new session starts without memories of the previous one.” a blog post“Because context windows are limited, and because most complex projects cannot be completed in a single window, agents need a way to bridge the gap between coding sessions,”

Anthropic engineers proposed a two-pronged approach to their Agent SDK: an initializer agent to set up the environment, and a coding agent to make incremental progress in each session and leave artifacts for the next.

agent memory problem

Because agents are built on basic models, they are constrained by a limited, though ever-expanding, context window. For agents working long hours, this can cause a major problem, causing agents to forget instructions and behave abnormally while performing tasks. enhancing agent memory becomes essential for sustained, business-safe performance.

Several approaches have emerged over the past year, all attempting to bridge the gap between the context window and agent memory. Langchenlangmem sdk, memobase And OpenAIThere are examples of companies offering swarm memory solutions. Research on agentic memory has also exploded recently with proposed memp like structures and this nested learning paradigm From Google Introducing new options to enhance memory.

Many of the current memory frameworks are open source and can be ideally suited to various large language models (LLM) powering agents. Anthropic’s approach improves upon its Cloud Agent SDK.

how it works

Anthropic identified that even though the Cloud Agent SDK had context management capabilities and “it should be possible for an agent to continue performing useful tasks for an arbitrarily long time,” this was not enough. The company said in its blog post that a model like opus 4.5 Running the Cloud Agent SDK “may fall short of building a production-quality web app if it is given only a high-level prompt, such as ‘Clone cloud.ai.'”

The failures appeared in two patterns, Anthropic said. First, the agent tried to do too much, causing the model to go out of context midway. The agent then has to guess what happened and cannot give clear instructions to the next agent. The second failure occurs later, when some features have already been built. The agent sees that progress has been made and simply announces the task as complete.

Anthropological researchers came up with a solution: setting up an initial environment to lay the foundation of features and motivate each agent to make gradual progress toward a goal, while leaving a clean slate at the end.

That’s where Anthropic’s agent’s two-part solution comes in. The initializer sets up the agent environment, logging what agents have done and what files have been added. The coding agent will then ask the models to make incremental progress and release structured updates.

“The inspiration for these practices came from learning what effective software engineers do every day,” Anthropic said.

The researchers said they have added testing tools to the coding agent, improving its ability to identify and fix bugs that were not obvious from the code alone.

future research

Anthropic said its approach is “a potential set of solutions in a long-term agent harness.” However, this is just the early stages of what will become a broader research area for many in the AI ​​field.

The company said its experiments to boost long-term memory for agents have not revealed whether a single general-purpose coding works best across agent contexts or a multi-agent structure.

Its demo also focuses on full-stack web app development, so other experiments should focus on generalizing the results across different tasks.

Anthropic said, “It is likely that some or all of these lessons could be applied to the types of tasks required for long-running agentic tasks, for example, scientific research or financial modeling.”



<a href

Leave a Comment