New framework simplifies the complex landscape of agentic AI

Agentic adaptation strategies
With the ecosystem of agentic tools and frameworks exploding in size, it is becoming increasingly difficult to navigate the many options for building AI systems, leaving developers confused and paralyzed when choosing the right tools and models for their applications.

one in new studyResearchers from multiple institutions present a comprehensive framework for untangling this complex web. They classify agentic frameworks based on their focus area and tradeoffs, giving developers a practical guide to choosing the right tools and strategies for their applications.

For enterprise teams, this transforms agent AI from a model-selection problem into an architectural decision about where to spend the training budget, how much modularity to preserve, and what tradeoffs they are willing to make between cost, flexibility, and risk.

agent vs tool optimization

Researchers divide scenarios into two primary dimensions: agent optimization And device optimization,

Agent optimization involves modifying the base model that underlies the agentic system. This is done by updating the internal parameters or policies of the agent through methods such as fine-tuning or reinforcement learning to better align with specific tasks.

Device adaptation, on the other hand, focuses on the environment surrounding the agent. Instead of retraining large, expensive foundation models, developers adapt external tools such as search retrievers, memory modules, or sub-agents. In this strategy the main agent remains "frozen" (unchanged). This approach allows the system to evolve without the huge computational cost of retraining the core model.

The study divided these into four different strategies:

A1: Equipment performance indicated: In this strategy, the agent learns by doing. It is optimized using directly verifiable feedback from the tool’s execution, such as when a code compiler interacts with a script or returns database search results. It teaches the agent "mechanics" Using a tool correctly.

is a prime example DeepSeek-R1Where the model was trained through reinforcement learning with verifiable rewards to generate code that successfully executed in the sandbox. The feedback signal is binary and objective (did the code run, or crash?). This method builds strong low-level competence in stable, verifiable domains such as coding or SQL.

A2: Agent output signal: Here, the agent is optimized based on the quality of its final answer, regardless of the intermediate steps and the number of tool calls it makes. It teaches the agent how to organize different tools to reach the right conclusions.

is an example search-r1An agent that performs multi-step retrieval to answer queries. The model only receives a reward if the final answer is correct, forcing it to learn better search and reasoning strategies to maximize that reward. A2 is ideal for system-level orchestration, enabling agents to handle complex workflows.

T1: Agent-agnostic: In this category, tools are trained independently on extensive data "planted" To the frozen agent. Think of the classic dense retrievers used in the RAG system. A standard retriever model is trained on simple search data. A powerful frozen LLM can use this retriever to find information, even if the retriever was not specifically designed for that LLM.

T2: Agent-Supervised: This strategy involves training tools specifically to serve the frozen agent. The supervision signal comes from the agent’s own output, creating a symbiotic relationship where the tool learns to provide exactly what the agent needs.

For example, s3 framework a little train "explorer" Model for document retrieval. This little model is rewarded based on being frozen "logical" (a big LLM) can answer the question correctly using those documents. The tool effectively adapts to fill the specific knowledge gaps of the lead agent.

Complex AI systems may use a combination of these optimization paradigms. For example, a deep research system might employ T1-style retrieval tools (pre-trained dense retrievers), T2-style adaptive search agents (trained via frozen LLM feedback), and A1-style reasoning agents (fine-tuned with execution feedback) in a comprehensively orchestrated system.

Hidden costs and business profits

For enterprise decision makers, choosing between these strategies often depends on three factors: cost, generalizability, and modularity.

Cost vs Flexibility: Agent Customization (A1/A2) provides maximum flexibility as you are rewiring the agent’s brain. However, the cost is very high. For example, Search-R1 (an A2 system) required training on 170,000 examples to internalize search capabilities. This requires massive computation and specialized datasets. On the other hand, models can be more efficient at inference time because they are much smaller than generalist models.

In contrast, tool optimization (T1/T2) is far more efficient. The S3 system (T2) trained a lightweight searcher using only 2,400 examples (about 70 times less data than Search-R1), achieving comparable performance. By optimizing the ecosystem rather than the agent, enterprises can achieve higher performance at lower costs. However, this comes with an overhead cost estimation time as s3 requires coordination with a larger model.

Generalization: There is risk in A1 and A2 methods "overfitting," Where an agent becomes so expert at a task that he loses general capabilities. The study found that Search-R1 excelled in its training tasks, but it struggled with specialized medical QA, achieving only 71.8% accuracy. This isn’t a problem when your agent is designed to perform very specific tasks.

In contrast, the s3 system (T2), which uses a general-purpose frozen agent assisted by a trained device, generalized better, achieving 76.6% accuracy on the same medical tasks. The frozen agent retained its comprehensive world knowledge, while the tool handled specific retrieval mechanics. However, T1/T2 systems rely on frozen agent knowledge, and will be useless if the underlying model cannot handle the specific task.

Modularity: T1/T2 strategies enable "Hot-swapping." You can upgrade the memory module or explorer without touching the core reasoning engine. For example, Memento Optimizes the memory module to retrieve previous cases; If requirements change, you update the module, not the planner.

The A1 and A2 systems are monolithic. Teaching an agent a new skill (such as coding) through fine-tuning can cause "disastrous mistake," Where it reduces a previously learned skill (such as mathematics) as its internal load is overwritten.

A strategic framework for enterprise adoption

Based on the study, developers should view these strategies as a progressive ladder, moving from low-risk, modular solutions to high-resource optimization.

Start with T1 (agent-agnostic tool): Equip a frozen, powerful model (like Gemini or Cloud) with an off-the-shelf tool like a Dense Retriever or a mcp connectorIt requires zero training and is perfect for prototyping and general applications, This is the low hanging fruit that can take you too far for most tasks,

Go to T2 (agent-supervised device): If the agent has difficulty using common tools, do not retrain the main model. Instead, train a small, specialized sub-agent (like a searcher or memory manager) to filter and format data the way the main agent likes it. It is highly data-efficient and suitable for proprietary enterprise data and high-volume and cost-sensitive applications.

Use A1 (equipment performance indicated) for specialization: If the agent fundamentally fails at technical tasks (for example, writing non-functional code or incorrect API calls) you will need to reprogram your understanding of the tool. "Mechanics." A1 is best for creating experts in verifiable domains like SQL or Python or your proprietary tools. For example, you can customize a smaller model for your specific toolset and then use it as a T1 plugin for a generalist model.

As Reserve A2 (Agent Output Signal) "nuclear option", Only train a monolithic agent from start to finish if you need it to internalize complex strategy and self-improvement. This is resource-intensive and is rarely necessary for standard enterprise applications. In fact, you rarely need to get involved in training your own models.

As the AI ​​landscape matures, the focus is shifting from building a huge, idealized model to building a smart ecosystem of specialized tools around a stable core. For most enterprises, the most effective path to agentic AI is not to build bigger brains, but to give the brain better tools.



<a href

Leave a Comment