8 Billion Tokens A Day Forced AT&T To Rethink AI Orchestration

When your average daily token usage is 8 billion per day, you have a massive problem. This was the case at AT&T, and Chief Data Officer Andy Marcus and his team recognized that it was not possible (or economical) to push everything through the big logic model. So, when building the internal Ask AT&T personal assistant, they rebuilt the orchestration layer. The result: a multi-agent stack built on Langchain where larger language model “super agents” direct smaller, underlying “worker” agents to perform more concise, objective-driven tasks. Marcus told VentureBeat that this flexible orchestration layer has dramatically improved latency, speed, and response time. Most importantly, his team has seen cost savings of up to 90%. “I believe the future of agentic AI is many, many, many small language models (SLMs),” he said. “We find that smaller language models are as accurate, if not more accurate, than larger language models over a given domain area.”

Recently, Marcus and his team used this re-architected stack with Microsoft Azure to build and deploy Ask AT&T Workflows, a graphical drag-and-drop agent builder to automate employee tasks.

Agents draw from a suite of proprietary AT&T tools that handle document processing, natural language-to-SQL conversion and image analysis. “As the workflow is executed, it’s AT&T’s data that’s really making the decisions,” Marcus said. Instead of asking general questions, “we’re asking questions about our data, and we take our data into account to make sure it focuses on our information when making decisions.” Nevertheless, a human always monitors the “chain reaction” of agents. All agent actions are logged, data is separated throughout the process, and role-based access is enforced when agents delegate workloads to each other. “Things happen autonomously, but the human in the loop still provides checks and balances to the whole process,” Marcus said.

No more manufacturing, using the ‘interchangeable and selectable’ model

AT&T doesn’t take "make everything from scratch" Mindset, Marcus said; It relies more on models that are “interchangeable and selectable” and “never reconstruct an object.” He explained that as functionality matures throughout the industry, they will increasingly abandon home appliances in exchange for off-the-shelf alternatives. “Because in this area, things change every week, if we’re lucky, sometimes several times a week,” he said. “We need to be able to pilot, plug in and plug out different components.” They evaluate the available options as well as themselves “really harshly”; For example, their Ask Data with Relational Knowledge Graphs has topped the Spider 2.0 Text to SQL accuracy leaderboard, and other tools have scored high on the BERT SQL benchmark. In terms of in-house agentic tools, his team uses Longchain as a core framework, fine-tunes models with standard retrieval-augmented generation (RAG) and other in-house algorithms, and partners closely with Microsoft, using the tech giant’s search functionality for its vector store. Ultimately, though, it’s important not to incorporate agentic AI or other advanced tools into everything just for the sake of it, Marcus advised. “Sometimes we make things more complicated than they need to be,” he said. “Sometimes I’ve seen more solutions than the engineer.” Instead, builders should ask themselves whether a given tool really needs to be agentic. This may include questions such as: What level of accuracy could be achieved if this were a simple, single-turn generative solution? How could they break it down into smaller pieces, where each piece could be delivered “more precisely”?, as Marcus put it. Accuracy, cost and equipment responsiveness should be the main principles. “Even though solutions have become more complex, those three basic principles still give us a lot of direction,” he said.

How 100,000 employees are really using it

Ask AT&T Workflows has been rolled out to more than 100,000 employees. More than half say they use it every day, Marcus said, and active adopters have reported up to a 90% increase in productivity. “We’re looking at are they using the system repeatedly? Because stickiness is a good indicator of success,” he said. The agent provides “two trips” for builder employees. There’s Pro-Code, where users can program Python behind the scenes, setting rules for how agents should work. The second is no-code, featuring a drag-and-drop visual interface for a “very lightweight user experience,” Marcus said. Interestingly, even skilled users are gravitating towards the latter option. In a recent hackathon geared toward a technical audience, participants were given a choice of two, and more than half chose low code. “It was a surprise to us, because all these guys were very capable in the programming aspect,” Marcus said. Employees are using agents in a variety of tasks; For example, a network engineer might create a series of alerts to address and reconnect customers when they lose connectivity. In this scenario, an agent can correlate telemetry to identify the network problem and its location, pull change logs, and investigate known problems. Again, this may open a problem ticket. Another agent can then come up with ways to solve the problem and even write new code to fix it. Once the problem is resolved, the third agent can write a summary with preventive measures for the future. “The [human] Engineers will keep an eye on all of this, making sure agents are performing as expected and taking the right actions, Marcus said.

AI-fueled coding is the future

That same engineering discipline — breaking work into smaller, purpose-built pieces — is now reshaping the way AT&T writes code, says Marcus. "AI-fueled coding."

He compared the process to RAG; Dave uses agile coding methods with “function-specific” build archetypes in an integrated development environment (IDE) that determine how code should interact. The output is not loose code; The code is “very close to production grade” and can reach that quality in one go. “We’ve all worked with vibe coding, where we have an agentic type of code editor,” Marcus said. But AI-fueled coding “eliminates a lot of the back-and-forth iterations that you might see in Vibe coding.” He sees this coding technique as “markedly redefining” the software development cycle, ultimately shortening development timelines and increasing the output of production-grade code. Non-technical teams can also get in on the action by using simple language symbols to create software prototypes. For example, his team has used the technique to create an internal curated data product in 20 minutes; Without AI, it would take six weeks to build. “We develop software with it, modify software with it, do data science with it, do data analytics with it, do data engineering with it,” Marcus said. “So this is a game changer.”

<a href

8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

No more manufacturing, using the ‘interchangeable and selectable’ model

How 100,000 employees are really using it

AI-fueled coding is the future

Like this:

Related

Leave a Comment Cancel reply

No more manufacturing, using the ‘interchangeable and selectable’ model

How 100,000 employees are really using it

AI-fueled coding is the future

Share this:

Like this:

Related

Leave a Comment Cancel reply