
Engineering teams are creating more code with AI agents than ever before. But when that code reaches production they become a bottleneck.
The problem isn’t necessarily the AI-generated code itself. This is because traditional monitoring tools typically struggle to provide the granular, function-level data AI agents need to understand how code actually behaves in complex production environments. Without that context, agents can’t detect problems or generate solutions that take into account the reality of production.
This is a challenge that startups hood Wednesday wants to help solve that problem with the launch of its Runtime Code Sensor. The company’s eponymous sensor runs along the production code, automatically tracking how each function behaves, letting developers know exactly what’s happening in deployment.
"Every software team building at scale faces the same fundamental challenge: building high quality products that work well in the real world," Roi Adler, CEO and founder of Hud, told VentureBeat in an exclusive interview. "In the new era of AI-accelerated development, not knowing how code will behave in production becomes an even bigger part of that challenge."
What are software developers struggling with?
The problems developers face are quite similar across engineering organizations. Moshik Elon, group tech lead at Monday.com, oversees 130 engineers and describes a familiar frustration with traditional monitoring tools.
"When you get an alert, you’re typically investigating an endpoint that has a high error rate or high latency, and you want to drill down to see downstream dependencies," Elon told VentureBeat. "Sometimes it’s a real application, and then it’s a black box. You only get 80% downstream latency on the application."
The next step usually involves manual spying work on multiple devices. Check logs. Correlated timestamps. Try to recreate what the application was doing. For deep novel issues in large codebases, teams often lack the precise data needed.
Daniel Marshallian, CTO and co-founder of Dreta, watched his engineers spend hours on what he called "Please check." "They were mapping a generic alert to a specific code owner, then digging through the logs to reconstruct the state of the application," Marshallian told VentureBeat. "We wanted to eliminate this so our team could focus solely on solutions rather than discovery."
Dreta’s architecture adds to the challenge. The company integrates with a number of external services to provide automated compliance, which runs sophisticated investigations when issues arise. Engineers trace behavior across a very large codebase spanning risk, compliance, integration, and reporting modules.
Marshallian identified three specific problems that prompted Dreta to invest in runtime sensors. The first issue was the cost of context switching.
"Our data was scattered, so our engineers had to act as human bridges between disconnected devices," He said.
The second issue, he said, is alert fatigue. "When you have a complex distributed system, normal alert channels become a constant stream of background noise, which our team describes as a ‘ding, ding, ding’ effect that eventually gets ignored." Marshallian said.
The third key driver was the need to integrate with the company’s AI strategy.
"An AI agent can write code, but it can’t fix production bugs if it can’t see the runtime variables or root cause," Marshallian said.
Why can’t traditional APM solve the problem easily?
Enterprises have long relied on a class of tools and services called application performance monitoring (APM).
With the current pace of agentic AI development and modern development workflows, both Monday.com and Dreta were not able to get the visibility they needed from existing APM tools.
"If I wanted to get this information from Datadog or from CoreLogix, I would just have to swallow tons of logs or tons of spans, and I would have to pay a lot of money," Elon said.
Elon said that Monday.com used very low sampling rates due to cost constraints. This means they often miss the accurate data needed to debug issues.
Traditional application performance monitoring tools also require prediction, which is a problem because sometimes a developer doesn’t know what they don’t know.
"Traditional overview requires you to guess what you will need to debug," Marshallian said. "But when a new issue comes up, especially within a large, complex codebase, you’re often missing the exact data you need."
Dreta evaluated several solutions in the AI site reliability engineering and automated incident response categories and did not find what was needed.
"Most of the tools we evaluated were excellent at managing incident processing, routing tickets, summarizing Slack threads, or correlating graphs." He said. "But they often stopped due to lack of code. They could tell us ‘Service A is down’, but they couldn’t tell us specifically why."
Another common capability in some tools, including error monitors like Sentry, is the ability to catch exceptions. The challenge, according to Adler, is that it’s good to be aware of exceptions, but it doesn’t connect them to business impact or provide the necessary execution context for AI agents to propose solutions.
How runtime sensors work differently
Runtime sensors push intelligence to the edge where code executes. HUD’s sensor runs as an SDK that integrates with a single line of code. It watches the execution of every function but sends only light aggregate data unless something goes wrong.
When errors or slowdowns occur, the sensor automatically collects deep forensic data, including HTTP parameters, database queries and responses, and the full execution context. The system establishes performance baselines within a day and can alert on both dramatic slowdowns and outliers that percentage-based monitoring misses.
"Now we get all this information for all functions, regardless of what level they are, even for built-in packages," Elon said. "Sometimes you may have an issue that runs much deeper, and we still look at it very quickly."
The platform delivers data through four channels:
- web application For centralized monitoring and analysis
-
IDE extension For VS Code, JetBrains, and Cursor, surface output metrics directly where the code is written
-
mcp server which feeds structured data to AI coding agents
-
warning system Which identifies problems without manual configuration
MCP server integration is important for AI-assisted development. Monday.com engineers now query production behavior directly within the cursor.
"I can simply ask the cursor a question: Hey, why is this endpoint slow?" Elon said. "When it uses under the hood MCP, I get all the granular metrics, and this function is 30% slower since this deployment. Then I can also find out the root cause."
This changes the incident response workflow. Instead of launching into Datadog and digging deeper through the layers, engineers start by asking the AI agent to diagnose the problem. Agents have immediate access to function-level production data.
From magical events to minute solutions
The shift from theoretical potential to practical impact becomes clear in how engineering teams actually use runtime sensors. Detective work that previously took hours or days can now be solved in minutes.
"I’m used to these magical events where the CPU spikes and you don’t know where it came from," Elon said. "A few years ago, such an incident happened to me and I had to create my own tool that takes CPU profiles and memory dumps. Now I have all the function data and I’ve seen engineers solve it so fast."
In Dreta, the quantified impact is dramatic. The company created an internal/triage command that helps engineers run within their AI assistants to quickly identify root causes. Manual triage work reduced from approximately 3 hours per day to less than 10 minutes. Average time to resolution improved by approximately 70%.
The team also prepares a daily "conscious" Quick-win reports errors. Since the root cause is already caught, developers can fix these problems in minutes. Support engineers now perform forensic diagnostics that previously required a senior developer. Increased ticket throughput without expanding the L2 team.
Where this technology fits in
Runtime sensors occupy a distinct niche from traditional APM, which excel at service-level monitoring but struggle with granular, cost-effective function-level data. They differ from error monitors that catch exceptions without business context.
The technical requirements to support AI coding agents differ from those for human-facing observation. Agents need structured, function-level data on which they can reason. They cannot parse and correlate raw logs like humans can. The traditional overview also assumes that you can anticipate what you will need to debug and deploy tools accordingly. That approach breaks down with AI-generated code where engineers don’t understand every function in depth.
"I think we’re entering a new era of AI-generated code and this puzzle, this puzzle of the emergence of a new stack," Adler said. "I just don’t think that the cloud computing observability stack is going to fit well into what the future looks like."
What does this mean for enterprises
For organizations already using AI coding assistants like GitHub Copilot or Cursor, Runtime Intelligence provides a security layer for production deployments. This technology enables Monday.com "agent investigation" Instead of manual tool-hopping.
The broader implication relates to trust. "With AI-generated code, we are getting a lot more AI-generated code, and engineers don’t know all the code," Elon said.
Runtime Sensor bridges that knowledge gap by providing production context directly into the IDE where code is written.
For enterprises looking to scale AI code generation beyond pilots, runtime intelligence solves a fundamental problem. AI agents generate code based on assumptions about system behavior. Production environments are complex and surprising. Function-level behavior data automatically captured from production gives agents the context they need to generate reliable code at scale.
Organizations should evaluate whether their existing observation stack can cost-effectively provide the granularity needed for AI agents. If achieving function-level visibility requires dramatically increasing ingestion costs or manual instrumentation, runtime sensors may offer a more sustainable architecture for the AI-accelerated development workflows already emerging across the industry.
<a href