Booking.com’s Agent Strategy: Disciplined, Modular And Already Delivering 2× Accuracy

When many enterprises were not even thinking about agentic behavior or infrastructure, Booking.com had already “stumbled into” them with its in-house conversational recommendation system. This early experiment has allowed the company to take a step back and avoid getting caught up in the frenetic AI agent hype. Instead, it is taking a disciplined, layered, modular approach to model development: small, trip-specific models for cheap, fast inference; Large Language Models (LLM) for reasoning and comprehension; And domain-tuned assessments are built in-house when precision is critical. With this hybrid strategy – coupled with selective collaboration with OpenAI – Booking.com has seen accuracy double across key retrieval, ranking and customer-interaction tasks. As Pranav Pathak, head of AI product development at Booking.com, told VentureBeat in a new podcast: “Do you make it very, very specific and particular and then have an army of a hundred agents? Or do you keep it fairly general and you have five agents who are good at generalized tasks, but then you have to organize a lot around them? I think it’s a balance we’re still trying to figure out, as is the rest of the industry.” see new beyond the pilot Podcast here, and keep reading for the highlights.

Moving from guesswork to deep personalization without being ‘creepy’

Recommendation systems are critical to Booking.com’s customer-facing platforms; However, traditional recommendation tools have been less about recommendation and more about guessing, Pathak acknowledged. So, from the beginning, he and his team vowed to avoid generic tools: as he said, pricing and recommendations should be based on the customer’s context. Booking.com’s initial pre-gen AI tooling for detecting intent and topic was a small language model, which Reader described as “the scale and size of BERT.” The model incorporated the customer’s input about their problem to determine whether it could be solved through self-service or through a human agent. “We started with an architecture saying, ‘If you detect this intent you have to call a tool and this is how you parsed the structure,'” Pathak explains. “It was similar to some of the first agentic architectures that came out in terms of defining reason and tool calls.” His team built that architecture to include an LLM orchestrator that classifies queries, triggers retrieval-augmented generation (RAG) and calls APIs or small, specialized language models. “We’ve been able to scale that system quite well because it was so close in architecture that, with a few changes, we now have a full agentic stack,” Pathak said. As a result, Booking.com is seeing a 2x increase in topic detection, which in turn is freeing up 1.5 to 1.7x the bandwidth of human agents. More topics, even complex topics that were previously identified as ‘other’ and needed to be pursued further, are being automated. Ultimately, it supports more self-service, freeing up human agents to focus on customers with niche-specific problems for which the platform doesn’t have a dedicated tool flow — for example, a family who is unable to access their hotel room at 2 a.m. when the front desk is closed. Pathak said that not only does this “start to get really complicated”, but it has a direct, long-term impact on customer retention. “One of the things we’ve noticed is that the better we are at customer service, the more loyal our customers are.” Another recent rollout is personalized filtering. Pathak reported that Booking.com has between 200 and 250 search filters on its website – an unrealistic amount for any human being. So, their team introduced a free text box that users could type in to instantly get tailored filters. “It becomes an important signal for personalization in terms of what you’re looking for in your own words rather than in the clickstream,” Pathak said. This, in turn, signals to Booking.com what customers really want. For example, hot tub – when filter personalization first started, Jacuzzi was one of the most popular requests. This was not even considered before; There was no filter there either. That filter is now live. “I had no idea,” Pathak said. “Honestly, I never looked for a hot tub in my room.” However, when it comes to personalization, there’s a fine line; Memory remains complex, Pathak stressed. While it is important to keep long-term memories and evolving relationships with customers – retaining information such as their general budget, preferred hotel star rating or whether they require disability access – this should be on their terms and their privacy protected. Booking.com is extremely vigilant in terms of its memory, asking for consent so as not to be “spooky” when collecting customer information. “Managing memory is actually much more difficult than creating memory,” Pathak said. “The technology is there, we have the technical features to build it. We want to make sure that we don’t launch a memory object that doesn’t respect customer consent, that doesn’t feel very natural.”

Finding the Balance of Build vs. Buy

As agents mature, Booking.com is raising a central question facing the entire industry: How narrow should agents be? Rather than commit to a bunch of highly specialized agents or a few generalized agents, the company aims to make reversible decisions and avoid “one-way doors” that lock its architecture into long-term, expensive paths. The reader’s strategy is: generalize where possible, create specialization where necessary, and keep agent design flexible to help ensure flexibility. Pathak and his team are “very mindful” of use cases, and are evaluating where to create more generalized, reusable agents or more task-specific agents. They strive to use the smallest model possible with the highest level of accuracy and output quality for each use case. Anything that can be generalized is. Latency is another important consideration. When factual accuracy and avoiding hallucinations are paramount, his team will use a larger, much slower model; But with search and recommendations, user expectations set the pace. (Reader said: “No one is patient.”) “For example, we would never use something heavy like GPT-5 just for topic detection or entity extraction,” he said. Booking.com takes a similarly flexible stance when it comes to monitoring and evaluation: If it’s general-purpose monitoring that someone else is better at building and has horizontal capability, they’ll buy it. But if there are instances where brand guidelines should be applied, they will create their own assessment. Ultimately, Booking.com has become “super anticipatory”, agile and flexible. “At this point with everything that’s happening with AI, we’re a little averse to walking through a one-way door,” Pathak said. “We want as many of our decisions to be reversible as possible. We don’t want to be locked into a decision that we can’t reverse two years from now.”

What other builders can learn from Booking.com’s AI journey

Booking.com’s AI journey can serve as an important blueprint for other enterprises. Looking back, Pathak admits that they started out with a “pretty complex” technology stack. They’re in a good place with it now, “but we probably could have started with something more simple and seen how customers interacted with it.” Given this, he offered this valuable advice: If you’re starting out with LLM or agents, the out-of-the-box APIs will work fine. “There’s enough customization with the API that you can get a lot of mileage out of it even before you decide you want to do more.” On the other hand, if a use case requires customization that is not available through standard API calls, then it becomes a case for in-house tools. Still, he emphasized: Don’t start with complicated things. Deal with “the simplest, most painful problem you can find and its simplest, most obvious solution.” He advised identifying product-market fit, then examining the ecosystem – but don’t eliminate old infrastructure just because a new use case demands something specific (like moving an entire cloud strategy from AWS to Azure to use OpenAI endpoints). Ultimately: “Don’t turn yourself off too early,” said Pathak. “Don’t make a unilateral decision unless you are absolutely sure that this is the solution you want to go with.”

<a href

Booking.com’s agent strategy: Disciplined, modular and already delivering 2× accuracy

Moving from guesswork to deep personalization without being ‘creepy’

Finding the Balance of Build vs. Buy

What other builders can learn from Booking.com’s AI journey

Like this:

Related

Leave a Comment Cancel reply

Moving from guesswork to deep personalization without being ‘creepy’

Finding the Balance of Build vs. Buy

What other builders can learn from Booking.com’s AI journey

Share this:

Like this:

Related

Leave a Comment Cancel reply