Databricks Says It Solved The Decades-old Data Pipeline Problem That's Been Slowing AI Agents

For decades, data professionals have grappled with the challenge of managing both operational and analytical databases in an integrated approach that does not introduce latency and performance degradation.

Agents made the problem structural. A system that continuously reasons and acts on live data cannot afford a pipeline between itself and the information it needs to act on.

At the Data + AI Summit on Tuesday, Databricks announced two products aimed at decommissioning that infrastructure. Lakehouse//RT provides millisecond query latency on directly governed Delta and Iceberg tables, eliminating the dedicated real-time service tiers that enterprises have built with their Lakehouse. LTAP, short for Lake Transactional/Analytical Processing, stores Postgres-native transactional data in Delta and Iceberg formats from the point of write, eliminating the ETL pipelines that have been associated with operational and analytical systems for decades.

Databricks co-founder Reynold Zinn described a simple data stack as "holy grail for agents" In a briefing with VentureBeat, Arguing that as users code more applications, the underlying infrastructure that reasons analytically on top of those apps needs to move faster.

"Agents actually prefer very simple stacks, because they can move faster," He said.

LTAP bets on storage-layer integration where HTAP tried engine convergence

Many vendors have tried different approaches for decades to integrate analytical and transactional data.

In 2014, analyst firm Gartner coined the term HTAP, an acronym for Hybrid Transactional/Analytical Processing as a way to describe vendors who attempted to unify the two types of databases. MemSQL (now known as SingleStore) vendors are among several HTAP vendors on the market, including SAP HANA and Oracle’s MySQL Heatwave.

LTAP is Databricks’ answer to HTTP, using the Lakebase architecture to unify data at the storage level rather than the engine level. Lakebase is a serverless cloud-based PostgreSQL database service from Databricks that became generally available in February.

"For us the HTAP industry is more of a failure than a success," Shin said.

The LTAP approach goes to the storage layer instead of the query layer. Lakebase previously stored Postgres data in Postgres format on object storage, requiring conversion before Lakebase’s analytical engines could use it efficiently. With LTAP, transactional data comes directly in Delta or Iceberg format, sharing the same copy that analytical workloads read. Postgres remains a transactional engine. Spark and Lakehouse remain analytical engines.

"The whole point is, hey, you use the best tool for the task at the query engine level, we just make sure that the underlying storage has a copy of the data," Shin said.

The central engineering challenge is latency. Object storage gives response times in the second range, which is too slow for OLTP workloads that require sub-millisecond performance. Lakebase handles this through a caching layer between the Postgres compute instance and object storage. The key design decision is where the column conversion occurs: the idle CPU capacity in that caching layer performs the row-to-column conversion before the data lands in object storage.

"When you convert data from row to column, it’s typically compressed more than 10 times, so now you significantly reduce the network cost of that original caching layer between that caching layer and the object stores," Shin said.

Lakehouse//RT provides millisecond query latency on live Lakehouse data without a separate serving tier

Lakehouse//RT is Databricks’ answer to dedicated real-time service tiers – separate systems enterprises maintain alongside their Lakehouses to handle low-latency queries at the expense of data copies, partition governance, and pipeline complexity agents can’t handle. Lakehouse//RT’s key capabilities include:

Raiden Compute Engine: Built specifically for high-concurrency, low-latency service, Rayden queries Delta and Iceberg tables directly without moving the data out of Lakehouse.

Latency and Throughput: Lakehouse//RT delivers sub-100 ms latency at 12,000 queries per second, response times under 10 ms on small datasets and up to 16x better performance than existing dedicated serving stacks.

Governance and data access: Every query runs within the governance framework of Unity Catalog, with no separate permissions layer, no data copies, and no ingestion pipeline.

Analysts see agentic framing and open format approach as real differentiators

The problem both products solve is well documented among enterprise data teams, but analysts distinguish between the problem point and the specific claim being made by Databricks.

"Enterprises have had HTAP, streaming, cloud warehouses and operational stores for years," Stephanie Walter, practice leader of the AI stack at Hyperframe Research, told VentureBeat. "What is different is agentic AI framing."

Walter said agents need live operational data, historical context, governance, recovery and write-back in a single workflow.

"This is a strong architectural argument, but Lakebase still has to prove that it can meet the latency, reliability, and operational maturity CIOs expect," He said.

Moore Insights & Strategy analyst Mike Lyons said the path to true differentiation is more specific than the integration concept. He also said that open analytics on data lakes is now table stakes, with multiple vendors providing some form of service.

"A less common move is to make transactional writes available in open formats as well, so the operational database is not sitting in a proprietary box, while only the analytics is half open, "Lyon told VentureBeat.

He said the open format approach, coupled with Lakehouse//RT querying live data directly from the lake, gives the architecture a credible case to retire a full range of specialized systems.

The technical claim that will be most scrutinized is also the most central. "I still wish their engineers would implement how both engines actually share a copy of a cool conversion step without syncing in between," Leon said.

What does this mean for enterprises

For data engineers evaluating their stack for agentic workloads, the question is no longer which tool is best to run for each task – it’s whether running different tools is still defensible.

Enterprises that have created separate operational databases, real-time service levels, and analytical lakehouses may previously have regarded the gaps between them as a maintenance burden. Agents expose those shortcomings as an operational risk: A system reasoning across governance boundaries will detect anomalies faster than any human team.

The market is moving away from specialized serving layers faster than most vendor roadmaps anticipate. According to VB Pulse Q1 2026, a three-wave longitudinal survey of more than 100 employee organizations, hybrid retrieval intent tripled from 10.3% to 33.3% during the quarter, while standalone vector database adoption declined across every tracked vendor. The same consolidation logic is now impacting the real-time service level.

The traditional approach – best-of-breed tools for each workload type, pipelined between them – was built for human-speed analytical consumption. The agent workload does not tolerate that architecture.

"The pain they’re pointing to, all the duplication and coordination between operational and analytical systems, is real and expensive, and anyone running it at scale feels it," Leon said.

<a href

Databricks says it solved the decades-old data pipeline problem that's been slowing AI agents

LTAP bets on storage-layer integration where HTAP tried engine convergence

Lakehouse//RT provides millisecond query latency on live Lakehouse data without a separate serving tier

Analysts see agentic framing and open format approach as real differentiators

What does this mean for enterprises

Like this:

Related

Leave a Comment Cancel reply

LTAP bets on storage-layer integration where HTAP tried engine convergence

Lakehouse//RT provides millisecond query latency on live Lakehouse data without a separate serving tier

Analysts see agentic framing and open format approach as real differentiators

What does this mean for enterprises

Share this:

Like this:

Related

Leave a Comment Cancel reply