Definity Embeds Agents Inside Spark Pipelines To Catch Failures Before They Reach Agentic AI Systems

For most data engineering teams, managing pipeline reliability often means waiting for alerts, manually detecting failures in distributed jobs and clusters, and fixing problems after they have an impact on the business. Agent AI requires data to be clean and timely. A pipeline that silently fails or delivers outdated data doesn’t just break the dashboard – it also breaks the AI systems that depend on it.

That’s the difference Definity, a Chicago-based data pipeline operations startup, is making: embedding agents directly inside a Spark or DBT driver to perform tasks during the pipeline run, not after. According to Definity, one enterprise customer identified 33% of their optimization opportunities in the first week of deployment and reduced troubleshooting and optimization effort by 70%. The company also claims that customers are solving complex Spark issues up to 10 times faster.

"For agentive data operations you need three big things: Full stack context that is real-time and production aware. Control of pipeline. And the ability to verify in a feedback loop. Without it, you can only look outside and read," Roy Daniel, CEO and co-founder of Definity, told VentureBeat in an exclusive interview.

The company announced Wednesday that it has raised $12 million in Series A financing led by GreatPoint Ventures, with participation from Dynatrace and existing investors StageOne Ventures and Hyde Park Venture Partners.

Why does existing pipeline monitoring fail on a large scale?

Existing tools solve the problem from outside the execution layer – Datadog, which acquired data quality monitor Metaplane last year, Databricks System Tables, and platforms like Unravel Data and ExcelData all read metrics after the task is completed. Dynatrace has monitoring capabilities; It also participated in the Definity Series A.

The certainty approach differs from other options in the way the solution is prepared. According to Daniel, this means that by the time a platform monitoring tool uncovers an issue, the pipeline has already run – and the failure, wasted computation or bad data has already occurred downstream.

"It always happens after the fact," Daniel said. "By the time you realize something has happened, it has already happened."

How Definity’s In-Execution Agents Work

The main architectural difference is where the agent sits – inside the pipeline rather than looking in from the outside.

Inline Instrumentation. The Definity system installs a JVM agent directly inside the pipeline execution layer through a single line of code, which runs beneath the platform layer and pulls execution data directly from Spark.

Execution context during run. As the pipeline runs, the agent captures query execution behavior, memory pressure, data skew, shuffling patterns, and infrastructure usage. It also dynamically infers lineage between pipelines and tables – no predefined data catalogs required.

Intervention, not just observation. The agent can modify resource allocation mid-term, stop a job before bad data spreads, or pre-empt the pipeline based on upstream data conditions. Daniel described a production deployment where the agent discovered that an upstream job had been emptied and the input table it had to write to was out of date – and the bad data could not reach any dependent systems, stopping the downstream pipeline before it could even begin.

What is real time and what is not. Detection and prevention is real time. When an engineer queries Assistant, root cause analysis and optimization recommendations run on demand, with the complete execution context already assembled.

Overhead and data residency. The agent adds about a second of calculation over an hour-long period. Only metadata is transmitted externally; Full on-premises deployment is available for environments where no metadata can leave the perimeter.

What in-execution intelligence looks like in a production environment

An early user of the Definity platform is Nexxon, an ad tech platform that runs Spark pipelines at scale for mission-critical ad workloads running on-premises.

Dennis Meyer, director of data engineering at Nexxon, told VentureBeat that the main problem he faced was not pipeline failure, but the rising costs of inefficiencies in an environment with no elastic cloud capacity to absorb the waste.

"The main challenge was not about pipelines breaking, but about managing an increasingly complex and large-scale environment," The mayor said. "Because we work on-premises, we don’t have the flexibility of immediate resiliency, so inefficiencies have a direct cost impact."

Existing surveillance equipment gave partial visibility to Nexon but was not sufficient to function systematically. "We had existing monitoring tools in place, but needed full-stack visibility to holistically understand workload behavior and systematically prioritize optimization," The mayor said.

Nexxon deployed Definity without any pipeline code changes. According to Meyer, the team identified 33% of its optimization opportunities within the first week, and engineering effort on troubleshooting and optimization dropped by 70%. The platform freed up infrastructure capacity, allowing the team to support workload growth without additional hardware investment.

"The key change was moving from reactive problem solving to proactive, continuous adaptation," The mayor said. "At scale, the biggest difference is often not the tooling – it’s the actionable visibility."

What this means for enterprise data teams

For data engineering teams running production Spark environments, the shift from reactive monitoring to in-execution intelligence has architectural and organizational implications worth thinking about.

Pipeline ops is becoming an AI infrastructure problem. Data pipelines that previously supported analytics now carry AI workloads with direct business dependencies. Failures that were once inconvenient are now blocking production AI delivery.

Troubleshooting time is a recoverable cost. According to Meyer, Nexon reduced engineering effort on troubleshooting and optimization by 70% after deploying Definity. For underperforming teams, getting back on the roadmap for this category of valuation is the most obvious near-term matter.

<a href

Definity embeds agents inside Spark pipelines to catch failures before they reach agentic AI systems

Why does existing pipeline monitoring fail on a large scale?

How Definity’s In-Execution Agents Work

What in-execution intelligence looks like in a production environment

What this means for enterprise data teams

Like this:

Related

Leave a Comment Cancel reply

Why does existing pipeline monitoring fail on a large scale?

How Definity’s In-Execution Agents Work

What in-execution intelligence looks like in a production environment

What this means for enterprise data teams

Share this:

Like this:

Related

Leave a Comment Cancel reply