LangChain's CEO argues that better models alone won't get your AI agent to production

Harness engineering
As models become smarter and more capable "harness" There should be development around them also. it "harness engineering" It’s an extension of context engineering, Harrison Chase, Langchain’s co-founder and CEO, said in the new VentureBeat Beyond the Pilot podcast episode. While traditional AI harnesses prevent models from running in loops and calling tools, harnesses built specifically for AI agents allow them to interact more freely and effectively perform long-running tasks.

Chase also objected to OpenAI’s acquisition of OpenClaw and argued that its viral success was dependent on willpower. "let it rip" In a way that no major lab will do – and the question is whether the acquisition actually brings OpenAI closer to a secure enterprise version of the product. “The trend in harnesses is to really give the larger language model (LLM) more control over context engineering, letting it decide what it sees and what it doesn’t see,” Chase says. “Now, this idea of ​​a longer-lasting, more autonomous assistant is viable.”

Tracking progress and maintaining consistency

Although the concept of allowing LLMs to run in loops and call tools seems relatively simple, it is difficult to accomplish reliably, Chase said. For some time, models were “below the bounds of usefulness” and could not simply run in a loop, so developers used graphs and wrote chains to avoid this. Chase pointed to AutoGPT – which was the fastest-growing GitHub project ever – as a cautionary example: similar architecture to today’s top agents, but the models weren’t yet good enough to run reliably in a loop, so it quickly faded away. But as LLMs continue to be improved, teams can build environments where models can run in loops and plan over longer horizons, and they can continually improve these harnesses. Previously, “you couldn’t really improve the harness because you couldn’t actually drive the model in the harness,” Chase said. Langchen’s answer to this is Deep Agents, a customizable general-purpose harness. Built on Langchain and Langgraph, it has deployment capabilities, a virtual file system, context and token management, code execution and skills, and memory functions. Furthermore, it can delegate tasks to sub-agents; These are specialized with different devices and configurations and can work in parallel. The context is also isolated, meaning that the sub-agent task does not clutter the main agent’s context, and for token efficiency the larger sub-task context is compressed into a single result. All of these agents have access to the file system, Chase explained, and can essentially create to-do lists that they can execute and track over time. “When it goes to the next step, and it goes to step two or step three or step four out of a 200-step process, there’s a way to track its progress and maintain that consistency,” Chase said. “It essentially boils down to letting the LLM write down his or her own ideas as they move forward.” He emphasized that harnesses should be designed so that models can maintain coherence on longer tasks, and be “responsive” to the model deciding when to narrow the context at points it determines is “beneficial”. Additionally, giving agents access to code interpreters and BASH tools increases flexibility. And, providing agents with skills rather than just front-loaded tools allows them to load information when they need it. “So instead of hard coding everything into one big system prompt," Chase explained, "You might have a little system prompt like, ‘This is the basic premise, but if I need to do X, let me read the skills for X.’ If I need to do Y, let me read the skills for Y.’"

Essentially, context engineering is a “really fancy” way of saying: What is the LLM looking for? Because it’s different from what developers see, he said. When human devs can analyze the agent’s traces, they can put themselves into the “mindset” of the AI ​​and answer questions like: What is the system prompt? How is it made? Is it stable or is it populated? What tools does the agent have? When it calls a tool, and gets the response back, how is it presented? “When agents mess up, they mess up because they don’t have the right context; when they succeed, they succeed because they have the right context,” Chase said. “I think of reference engineering as bringing the right information in the right format to the LLM at the right time.” Listen to the podcast to learn more about it:

  • How Langchain built its stack: Langgraph as the main pillar, Langchain at the center, deep agents on top.

  • Why will code sandboxes be the next big thing?

  • How a different type of UX will evolve if agents run at longer intervals (or continuously).

  • Why traces and observations are important for building an agent that really works.

You can also listen and subscribe beyond the pilot But spotify, Apple Or wherever you get your podcasts.



<a href

Leave a Comment