
For the past two years, generative AI has been the fundamental unit of development "Closing."
You send a text prompt to a model, it sends the text back, and the transaction is finished. If you want to continue the conversation, you’ll need to send the entire history to the model again. it "stateless" Architecture—embodied by Google’s legacy generateContent Endpoint—was perfect for simple chatbots. But as developers move toward autonomous agents that use tools, maintain complex state, and "Thinking" Over the longer horizon, that stateless model has become a distinct obstacle.
Last week, Google DeepMind finally addressed this infrastructure shortcoming with the public beta launch of the Interaction API (/interactions,
While OpenAI ushered in this change with its Response API in March 2025, Google’s entry signals its efforts to push the cutting edge forward. The interaction API is not just a state management tool; It is a unified interface designed to make LLM behave less like a text generator and more like a remote operating system.
‘Remote Compute’ Model
The main innovation of the Interaction API is to introduce server-side state as the default behavior.
Previously, a developer building a complex agent had to manually manage a growing JSON list of each "user" And "Sample" In turn, sending megabytes of history back and forth with each request. With the new API, developers just do one pass previous_interaction_idGoogle’s infrastructure retains conversation history, tool output, etc, "thought" processes at their end.
"Models are becoming systems and, over time, may even become agents themselves," wrote DeepMind’s Ali Sevik and Philipp Schmid in an official company blog post on the new paradigm. "Efforts are being made to impose these capabilities by force generateContent This will result in overly complex and fragile APIs."
This change enables background execution, an important feature for the agentic era. Complex workflows – such as browsing the web for an hour to synthesize a report – often trigger HTTP timeouts in the standard API. Interaction API allows developers to trigger an agent background=true, Disconnect, and vote for results later. This effectively turns the API into a job queue for intelligence.
native "deep research" and mcp support
Google is using this new infrastructure to deliver its first built-in agent: Gemini Deep Research.
accessible through the same /interactions endpoint, this agent is able to execute "Long-horizon research work." Unlike a standard model that predicts the next token based on your signal, the Deep Research agent executes a loop of search, read, and synthesis.
Importantly, Google is also embracing the open ecosystem by adding native support for the Model Context Protocol (MCP). This allows Gemini models to directly call external tools – such as a weather service or database – hosted on a remote server, without requiring the developer to write custom glue code to parse the tool call.
Scenario: Google joins OpenAI in the ‘stateful’ era
Google is arguably playing catch-up, but with a different philosophical twist. OpenAI moved away from statelessness nine months earlier with the launch of the Response API in March 2025.
While both giants are addressing the problem of reference bloat, their solutions differ on transparency:
OpenAI (compression approach): OpenAI’s Response API introduced compaction – a feature that truncates the history of interactions by replacing tool outputs and argument chains with opaque ones. "Encrypted compaction items." This token prioritizes efficiency but creates a "black box" Where the previous logic of the model is hidden from the developer.
Google (hosted approach): Google’s Interaction API keeps the entire history available and compilable. The data model allows developers to "Debugging, manipulating, streaming, and reasoning about interleaved messages." It prioritizes observability over compression.
Supported models and availability
The Interaction API is currently in public beta (documentation here) and available immediately through Google AI Studio. It supports the full spectrum of Google’s latest generation models, ensuring that developers can match the right model size to their specific agentive task:
- Gemini 3.0: Gemini 3 Pro Preview.
-
Gemini 2.5: Flash, Flash-Light, and Pro.
-
Agent: Deep Research Preview (
deep-research-pro-preview-12-2025,
Commercially, the API integrates into Google’s existing pricing structure – you pay standard rates for input and output tokens, depending on the model you choose. However, the value proposition changes with new data retention policies. Because this API is stateful, Google must store your interaction history to enable features like built-in caching and context retrieval.
Access to this storage is determined by your tier. Developers on the free tier are limited to a 1-day retention policy, which is suitable for transient testing but inadequate for long-term agent memory.
Developers on the paid tier unlock a 55-day retention policy. This extended retention isn’t just for auditing; This effectively reduces your total cost of ownership by maximizing cash hits. keeping history "warm" With approximately two months on the server, you avoid paying to re-process massive reference windows for recurring users, making the payment tier significantly more efficient for production-grade agents.
Note: Since this is a beta release, Google advises that features and schema are subject to breaking changes.
‘You are interacting with a system’
Sam Witteveen, Google developer expert in machine learning and CEO of Red Dragon AI, sees this release as a necessary evolution of the developer stack.
"If we go back in history… the whole idea was simple text-in, text-out." Witteveen mentioned in the technical description of the release on YouTube. "But now… you’re interacting with a system. A system that can use multiple models, perform multiple loops of calls, use tools, and perform code execution on the backend."
Witteveen highlights the immediate economic benefit of this architecture: built-in caching. Because conversation history resides on Google’s servers, developers are not charged for re-uploading the same context repeatedly. "You won’t have to pay as much for the tokens you’re calling for," he explained.
However, the release is not without friction. Wittwein criticized the current implementation of Deep Research Agent’s citation system. While the agent provides the source, the URLs returned are often wrapped in internal Google/Vertex AI redirect links rather than raw, usable URLs.
"My biggest complaint is that…these URLs, if I save them and try to use them in a different session, they won’t work," Wittwein warned. "If I want to create a report for someone with citations, I want them to be able to click on the URL from a PDF file… the citation should be something like medium.com [without the direct link] Not very good."
What does this mean for your team
For lead AI engineers focused on rapid model deployment and fine-tuning, this release continues to provide a straightforward architectural solution "timeout" Problem: Background execution.
Instead of creating complex asynchronous handlers or managing separate task queues for long-running logic functions, you can now upload this complexity directly to Google. However, this feature introduces a strategic trade-off.
While the new Deep Research Agent allows rapid deployment of sophisticated research capabilities, it acts as a "black box" Compared to custom-built Langchain or Langgraph flows. Engineers must create prototypes "slow thinking" using the feature background=true Parameter to evaluate whether the speed of implementation outweighs the loss of granular control over the research loop.
Senior engineers who manage AI orchestration and budget will know through changes in server-side status previous_interaction_id Unlocks implicit caching, which is a big win for both cost and latency metrics.
By referencing history stored on Google’s servers, you automatically avoid the token costs associated with re-uploading massive reference windows, directly addressing budget constraints while maintaining high performance.
The challenge here is in the supply chain; Incorporating remote MCP (Model Context Protocol) means that your agents are connecting directly to external tools, requiring you to rigorously verify that these remote services are secure and authenticated. It’s time to audit your current token spend when resending conversation history – if it’s high, prioritizing migration to the Stateful Interaction API could yield significant savings.
For senior data engineers, the Interaction API provides a more robust data model than raw text logs. Structured schema allows debugging complex histories and reasoning over them, improving overall data integrity in your pipelines. However, you should be cautious regarding data quality, particularly the issue raised by expert Sam Witteveen regarding citations.
Deep Research Agent has returned to the present "Wrapped" URLs that may expire or break, rather than raw source links. If your pipelines rely on scraping or storing these sources, you may need to create a cleanup step to extract usable URLs. You should also test the structured output capabilities (response_format) to see if they can replace delicate regex parsing in your current ETL pipelines.
Ultimately, for directors of IT security, moving the situation to Google’s centralized servers presents a paradox. This may improve security by keeping API keys and conversation history away from the client device, but it introduces a new data residency risk. The important thing to check here is Google’s data retention policies: While the free tier only retains data for one day, the paid tier retains interaction history for 55 days.
This is the opposite of OpenAI "zero data retention" (ZDR) Enterprise Options. You must ensure that storing sensitive conversation history for approximately two months is in line with your internal governance. If this violates your policy, you will need to configure the call store=falseHowever, doing so would disable the stateful features—and cost benefits—that make this new API valuable.
<a href