
Today’s LLMs excel at argumentation, but may still struggle with context. This is especially true in real-time ordering systems like Instacart.
Instacart CTO Anirban Kundu says this "Brownie recipe problem."
It’s not as simple as telling the LLM ‘I want to make brownies.’ To be truly helpful when planning meals, the model must go beyond that simple instruction to understand what is available in their market based on the user’s preferences – e.g., organic eggs vs. regular eggs – and take into account what is deliverable in their geography so that food does not spoil. This is one of the other important factors.
For Instacart, the challenge is to combat latency, ideally with the right mix of context to deliver an experience in less than a second.
“If the argument itself takes 15 seconds, and if every interaction is that slow, you’ll lose the user,” Kundu said at a recent VB event.
Mixing logic, real world situations, personalization
In grocery delivery, there is a “world of logic” and a “world of state” (what is available in the real world), Kundu As noted, both of these should be understood by the LLM with user preference. But it’s not as simple as loading a user’s entire purchase history and known interests into a logic model.
“Your LLM will become so big that it will become difficult to handle,” Kundu said.
To get around this, Instacart breaks up processing into chunks. First, the data is fed into a large fundamental model that can understand intent and classify products. That processed data is sent to a small language model (SLM) designed to catalog context (types of food or other items that work together) and semantic understanding.
In the case of a catalog context, the SLM must be able to process orders as well as multiple levels of detail for different products. For example, which products go together and what are their relevant replacements if the first choice is out of stock? These replacements are “very, very important” for a company like Instacart, Kundu said, adding that there are “more than double digit cases” where a product is not available in the local market.
In terms of semantic understanding, let’s say a shopper wants to buy healthy snacks for children. The model needs to understand what a healthy breakfast is and what foods are suitable for an 8-year-old child, and then identify relevant products. And, when those particular products are not available in a given market, the model also has to find related subsets of products.
Then there is the logistics element. For example, a product like ice cream melts quickly, and even frozen vegetables don’t taste good when left in hot temperatures. The model must have this reference and calculate acceptable delivery times.
Kundu said, “So you have this understanding of the implications, you have this classification, then you have this other part logically, how do you do that?”
Avoiding ‘monolithic’ agent systems
Like many other companies, Instacart is experimenting with AI agents, and finding that a mix of agents works better than a “single monolith” that performs several different tasks. The Unix philosophy of a modular operating system with small, focused devices helps address different payment systems, for example, which have different failure modes, Kundu explained.
“It was too cumbersome to create all this in one environment,” he said. Additionally, agents on the backend talk to multiple third-party platforms, including point-of-sale (POS) and catalog systems. Naturally, not all of them behave the same; Some are more reliable than others, and their update intervals and feeds vary.
“So to be able to handle all of those things, we’ve gone down this route of microagents rather than agents that are predominantly large in nature,” Kundu said.
To manage agents, Instacart has integrated with OpenAI’s Model Reference Protocol (MCP), which standardizes and simplifies the process of connecting AI models to various tools and data sources.
The company also uses Google’s Universal Commerce Protocol (UCP) open standard, which allows AI agents to interact directly with merchant systems.
However, Kundu’s team is still facing challenges. As he said, it’s not about whether integrations are possible or not, but rather how reliably those integrations behave and how well they are understood by users. The search can be difficult, not only in identifying available services, but also in understanding which services are suitable for which tasks.
Kundu said Instacart has had to implement MCP and UCP in “very different” cases, and the biggest problems they face are failure modes and latency. “The response time and understanding of those two services is so different, I’d say we spend probably two-thirds of the time fixing those error cases.”
<a href