Pinterest Cut AI Costs 90% By Gutting A Frontier Model's Vision Layer

At 620 million monthly users, calling Frontier Models for every image recommendation isn’t a strategy – it’s a bill. Pinterest CTO Matt Madrigal solved this by dismantling Qwen3-VL’s vision layer and rebuilding it with proprietary embeddings, cutting costs by 90% and increasing accuracy by 30%.

Madrigal’s team is investing heavily in optimizing the open-source model “fundamentally in-house.”

“If you have truly unique data with which you can fine-tune an open source model, the quality of the data will, obviously, be as much or greater than the model size,” Madrigal recently explained on the VB Beyond the Pilot podcast.

How Pinterest optimized Qwen for visual search

Pinterest, which has approximately 620 million monthly active users, has long implemented an open source model for visual search and discovery, going back to Google’s BERT and OpenAI’s CLIP. The company later fixed its own PIN CLIP, incorporating proprietary visual embeddings and image metadata.

Pinterest’s conversational shopping assistant, Navigator 1, was built on Qwen3-VL and optimized in “significant” ways. Madrigal’s team essentially “ripped apart” Quen’s vision encoder layer and fixed the model on proprietary multimodal embeddings. This has allowed them to capture metadata around Pins and images that can then be precomputed offline and regularly retrained on new information to deliver personalized experiences.

“The open-source model, especially with the Open Apache license, where you can really change a lot of the open source stuff and customize it for unique use cases — that’s where we’ve found open source to be so powerful for us,” Madrigal said.

Bringing your own embeddings allows their team to gain context around metadata, pins, and images; Furthermore, notably, the model performs better at runtime and inference. Without these embeddings, developers would have to call and encode each image returned at runtime one at a time. Madrigal said this results in latency “20 times worse” from an estimation perspective.

“If it’s something that’s going to be valuable to our end users, that’s going to drive engagement, that’s going to scale to over 600 million monthly active users, we’re either probably going to build it or we’re going to leverage open source and customize a lot of it,” he said.

How a taste graph captures evolving tastes

To guide users from inspiration to purchase, Madrigal’s team created a "taste graph": A dynamic representation of what individual users actually like, not just what they click on. “It represents the evolving interests of billions of people,” he said.

People visit Google or other search engines when they have a clear picture of what they want; Pinterest is for when they’re still in the discovery phase, Madrigal said. Pinterest’s goal is to encourage “lateral exploration” and convert search into intent (i.e., clicking on ads or making purchases).

Under the hood, the architecture combines a graph structure with representational learning. User embedding captures the user’s evolving tastes. These are constantly updated based on activity and new content and signals. “This is not a social graph,” Madrigal said. “It’s more of a priority graph: What will motivate you? What are you trying to do next?”

For example, a user may be interested in mid-century modern designs; Someone else may prefer a Nantucket aesthetic. Those preferences will be captured in user embeddings, and the taste graph will deliver specific, relevant products as a result.

“You go from upper funnel motivation discovery, to lower funnel intention,” Madrigal said.

Listen to the full podcast to learn more about it:

How Pinterest uses sandboxes in a safe and contained way to encourage creativity;
Why a continuous feedback loop view can prevent AI decline;
The importance of continuous benchmarking to measure user engagement, performance, latency and other factors.

You can also listen and subscribe beyond the pilot But spotify, Apple Or wherever you get your podcasts.

<a href

Pinterest cut AI costs 90% by gutting a frontier model's vision layer

How Pinterest optimized Qwen for visual search

How a taste graph captures evolving tastes

Like this:

Related

Leave a Comment Cancel reply

How Pinterest optimized Qwen for visual search

How a taste graph captures evolving tastes

Share this:

Like this:

Related

Leave a Comment Cancel reply