AI Agents + Kernel Tracepoints

TL;DR

MCP is becoming the interface between AI agents and infrastructure data. Datadog sent an MCP server connecting the dashboard to the AI ​​agents. Qualys flagged MCP servers as new shadow IT risk. We believe both are right, and we believe the architecture should go further: MCP servers should not wrap existing observability platforms. This should be the observability layer. This post explains how MCP can serve as a direct observation interface to kernel tracepoints, completely bypassing traditional metric pipelines.

three signs in a week

Three things happened in a single week in March 2026 that indicate where the observability is headed.

Datadog sent an MCP server. Their implementation connects real-time observation data to AI agents for automated detection and treatment. An AI agent can now query the Datadog dashboard, pull metrics, and trigger responses via the Model Context protocol. This is a big company validating a small protocol.

Qualys publishes security analysis of MCP servers. Their TotalAI team called MCP servers “the new shadow IT for AI” and found that over 53% of servers rely on static secrets for authentication. He recommended adding observability to MCP servers: logging capability discovery events, monitoring invitation patterns, alerting on anomalies.

Cloud Native Now has covered the EBPF for Kubernetes networks overview. Microsoft Retina deploys as a daemonset, captures network telemetry via eBPF without application changes, and provides kernel-level drop causes. The article draws a clear line between “surveillance” (predetermined questions) and “observation” (asking questions one had not planned).

The thread connecting the three: AI agents need direct access to infrastructure telemetry, and MCPs are becoming the way to get them.

Two approaches to MCP observation

There are two ways to connect observation data to AI agents through MCP.

Approach 1: Wrap an existing platform. Datadog’s strategy. Collect and aggregate existing metrics, logs and traces, and expose them through MCP tools. The AI ​​agent queries the dashboard API, receives pre-processed data and acts on it. This makes sense for teams with mature observability capabilities that want to add AI-powered automation on top.

Approach 2: Build MCP-native observability. That’s what we did with Tracer. Rather than wrap an existing platform, we created an eBPF agent that detects CUDA runtime and driver APIs through uprobes, stores the results in SQLite, and exposes everything through 7 MCP tools. The MCP interface is not an adapter layer; This is the primary interface.

Neither approach is wrong. They solve various problems.

The wrapper approach works well for aggregate analysis: “What was the P99 latency for service X in the last hour?” The data is already summarized, indexed, and queryable.

The basic approach works better to investigate the root cause: “Why did this specific GPU request take 14.5 times longer than expected?” This requires raw kernel events, CUDA call stacks and causal chains – not summaries. AI agents need to drill down, not roll up.

What MCP-native observability looks like in practice

Here is a concrete example. We detected a VLLM TTFT regression where the first token took 14.5 times longer than the baseline. The trace database captured every CUDA API call, every kernel context switch, every memory allocation.

When Cloud MCP connects to the server and loads this database, it can:

  1. get_trace_stats – View full trace summary: 12,847 CUDA events, 4 cause chains, total GPU time
  2. get cause_series – Read the causal chains that explain in simple English why latency increased
  3. run_sql – Run custom queries against raw event data (“Show me all cudaMemcpyAsync calls longer than 100ms”)
  4. get_stacks – Inspect the call stack for any marked events

The cloud identified the root cause in less than 30 seconds: the LogProbes calculation was blocking the decode loop, causing a 256x slowdown on the critical path. That root cause was not visible in any overall metric. It only appeared in the root cause chain between specific CUDA API calls.

The Dashboard MCP adapter could not find it. Data granularity does not survive aggregation.

Safety angle also matters

Qualys raised legitimate concerns about MCP server security. Their finding that 53% of servers rely on static secrets is worrying. His recommendation to log discovery and invocation events is spot on.

For MCP servers that touch GPU infrastructure, the attack surface is different. An MCP server with access to CUDA traces can expose timing information, memory layout, and model architecture details. The security model needs to take this into account.

In Injero, every MCP tool invocation is traced. The same eBPF infrastructure that captures GPU events also captures MCP interactions. This is not a separate logging layer; This is the same observability pipeline. Qualys’s recommendation to “add observability to an MCP server” becomes trivial when the MCP server is already an observability tool.

where is this going

We believe that MCP-native patterns will extend beyond GPU observability. Consider:

  • network observability: Instead of wrapping Prometheus in an MCP layer, create an EBPF-based network agent that exposes packet-level data directly to AI agents (Microsoft Retina is halfway there).
  • safety overview: Instead of wrapping a SIEM, create an MCP server that detects syscalls and exposes security events in real time.
  • cost overview: Instead of querying the cloud billing API through MCP, instrument the actual resource allocation and expose it directly.

The pattern is the same: skip the dashboard, skip the aggregation, give the AI ​​agent direct access to raw telemetry. Let the agent decide what to collect and how.

try it yourself

The project is open source. The test database from this post is available for download. The cloud (or any MCP client) can connect to it and run tests:

git clone https://github.com/ingero-io/ingero.git
cd ingero && make build
./bin/ingero mcp --db investigations/pytorch-dataloader-starvation.db

Check with AI (recommended)

You can point any MCP-compatible AI client to the trace database and ask questions directly. No code required.

First, create the MCP config file /tmp/ingero-mcp-dataloader.json: :

{
  "mcpServers": {
    "ingero": {
      "command": "./bin/ingero",
      "args": ["mcp", "--db", "investigations/pytorch-dataloader-starvation.db"]
    }
  }
}

With Olama (local, free):

# Install ollmcp (MCP client for Ollama)
pip install ollmcp

# Investigate with a local model (no data leaves your machine)
ollmcp -m qwen3.5:27b -j /tmp/ingero-mcp-dataloader.json

With Cloud Code:

claude --mcp-config /tmp/ingero-mcp-dataloader.json

then type /investigate And let the model explore. “What was the root cause?” Follow up with questions like. or “What processes were competing for CPU time?”

Add this to the Cloud Desktop configuration and ask: “What is causing the GPU performance issues in this trace?”

MCP Server displays 7 tools. The cloud will figure out the rest.


Injero is free and open source software licensed under Apache 2.0 (user-space) + GPL-2.0/BSD-3 (eBPF kernel-space). One binary, zero dependencies, <2% overhead. Get started with us on GitHub!

Related Reading



<a href

Leave a Comment