Agents need vector search more than RAG ever did

agentic vector database smk1
What is the role of vector databases in the agentic AI world? This is a question organizations have been facing in recent months. The story had real momentum. As large language models scaled up to million-token context windows, a credible argument circulated among enterprise architects: Purpose-built vector search was a stopgap, not an infrastructure. Agentic memory retrieval will absorb the problem. Vector databases were an artefact of the RAG-era.

Production evidence goes the other way.

QuadrantThe Berlin-based open source vector search company announced a $50 million Series B on Thursday, two years after a $28 million Series A. Timing is not accidental. The company is also shipping version 1.17 of its platform. Together, they reflect a specific logic: the recovery problem did not subside when agents arrived. It grew and became more difficult.

"Man asks some questions every few minutes," Andre Zyarni, CEO and co-founder of Quadrant, told VentureBeat. "Agents ask hundreds or even thousands of questions per second, just to collect information to be able to make decisions."

This change changes infrastructure requirements in a way that RAG-era deployments were never designed to handle.

Why do agents need a recovery layer that memory can’t replace

Agents work on information they were never trained on: proprietary enterprise data, current information, millions of documents that change constantly. Reference Windows manages session state. They do not provide high-recall search across that data, maintain retrieval quality as it changes, or maintain the query volume required for autonomous decision making.

"Most AI memory frameworks out there are using some form of vector storage," Zyarni said.

The implication is direct: even devices positioned as memory alternatives depend on recovery infrastructure.

Three failure modes emerge when that recovery layer is not purpose-built for the load. At the document scale, a missed result is not a latency problem – it is a quality-decision problem that combines each retrieval pass into a single agent turn. Under write load, relevance is reduced as newly ingested data sits in non-optimized segments before indexing begins, making searches on fresh data slower and less accurate when current information matters most. In distributed infrastructure, a slow replication increases latency in each parallel tool call in an agent turn – a delay that a human user absorbs as an inconvenience but an autonomous agent cannot.

The 1.17 release of Quadrant addresses each directly. A contextual feedback query improves recall by adjusting the similarity scoring on the next retrieval pass using a lightweight model-generated signal, without retraining the embedding model. The delayed fan-out feature queries the second replica when the first replica exceeds a configurable latency threshold. A new cluster-wide telemetry API replaces node-by-node troubleshooting with a single view across the entire cluster.

Why Quadrant no longer wants to be called a vector database

Nearly every major database now supports vectors as a data type – from hyperscalars to traditional relational systems. That change has changed the competitive question. The data type is now table stakes. What remains is expertise in recovery quality at production scale.

This is why Zynarny no longer wants Quadrant to be called a vector database.

"We are building an information retrieval layer for the AI ​​era," He said. "Databases are for storing user data. If the quality of search results matters, you need a search engine."

His advice for teams starting out: Use whatever support vector you already have in your stack. Teams that migrate to purpose-built recovery do so when scale forces a problem.

"We see companies come to us every day and say they started with Postgres and thought it was good enough – and it’s not."

Quadrant’s architecture, written in Rust, gives it memory efficiency and low-level performance control that high-level languages ​​cannot match at the same cost. The combination of an open source foundation provides benefits—community feedback and developer adoption—that allows a company at Quadrant’s scale to compete with vendors that have much larger engineering resources.

"Without it, we wouldn’t be where we are now," Zyarni said.

How two production teams found the limits of a general purpose database

Companies building production AI systems on Quadrant are making the same argument from different directions: Agents need a retrieval layer, and conversational or episodic memory is not a substitute.

GlassDollar helps enterprises including Siemens and Mahle evaluate startups. Search is the core product: a user describes a need in natural language and gets back a ranked shortlist from a pool of millions of companies. The architecture runs query expansion on each request – a single prompt is turned into multiple parallel queries, each retrieving the candidate from a different angle, before combining and re-ranking the results. This is an agentic retrieval pattern, not a RAG pattern, and requires purpose-built search infrastructure to be maintained at volume.

The company moved from Elasticsearch, moving to 10 million indexed documents. After moving to Quadrant it cut infrastructure costs by nearly 40%, dropped the keyword-based compensation layer to address Elasticsearch’s relevancy gap, and saw a 3x increase in user engagement.

"We measure success by memory," Kamen Kanev, head of product at GlassDollar, told VentureBeat. "If the results don’t include the best companies, nothing else matters. The user loses trust."

Agentic memory and extended context windows are not sufficient to absorb the workload that GlassDollar requires.

"This is an infrastructure problem, not a negotiated state management task," Kanev said. "This isn’t something you solve by expanding the context window."

is another Quadrant user &aiWhich is creating infrastructure for patent litigation. Its AI agent, Andy, runs semantic searches across millions of documents spanning decades and multiple jurisdictions. Patent attorneys will not work off AI-generated legal text, meaning every result the agent comes up with must be based on the actual document.

"Our entire architecture is designed to reduce the risk of hallucination by making retrieval the original primitive, not generation," &AI founder and CTO Herbie Turner told VentureBeat.

&For AI, the agent layer and the recovery layer are separate by design.

"Andy, our patent agent, is built on top of the Quadrant," Turner said. "Agent interface. The vector database is the ground truth."

Three Signs It’s Time to Shut Down Your Current Setup

Practical starting point: Use whatever vector abilities you already have in your stack. The evaluation question is not whether or not to add vector search – it is when your current setup ceases to be sufficient. Three pointers mark that point: recovery quality is directly tied to business results; Query patterns include expansion, multi-stage re-ranking, or parallel tool calls; Or the amount of data reaches millions of documents.

At that point the evaluation turns to operational questions: how much visibility does your current setup give you into what’s happening in the distributed cluster, and how much room for performance it has as agent query volumes grow.

"There’s a lot of noise right now about replacing the recovery layer," Kanev said. "But for anyone building a product where retrieval quality is the product, where not finding results has real business consequences, you need dedicated search infrastructure."



<a href

Leave a Comment