Conversational AI doesn’t understand users — 'Intent First' architecture does

Conversational AI
The modern customer has only one need that matters: getting talk they want when they want it. old standard RAG model embed+retrieval+llm Misunderstands intent, overloads context and misses freshness, repeatedly sending customers down the wrong path.

Instead, intent-first architecture uses a lightweight language model to parse queries for intent and context, before delivering them to the most relevant content sources (documents, APIs, people).

Enterprise AI is a speeding train headed for a cliff. Organizations are deploying LLM-powered search applications at record speeds, while a fundamental architectural issue is setting most up for failure.

A recent Coveo study showed that 72% of enterprise search queries fail to return meaningful results on the first attempt, while Gartner also predicts that the majority of conversational AI deployments are falling short of enterprise expectations.

The problem is not in the underlying models. It’s the architecture that surrounds them.

Having designed and run large-scale AI-powered customer interaction platforms serving millions of customers and citizen users in some of the world’s largest telecommunications and healthcare organizations, I’ve noticed a pattern. It’s the difference between successful AI-powered interaction deployments and multi-million-dollar failures.

It’s cloud-native architecture pattern i call intent-first. And it’s reshaping the way enterprises build AI-powered experiences.

$36 billion problem

Gartner estimates that the global conversational AI market will grow to $36 billion by 2032. Enterprises are struggling to get their share. The demos are irresistible. Plug your LLM into its knowledge base, and suddenly it can answer customer questions in natural language. Magic.

Then production takes place.

A major telecommunications provider I work with has introduced a RAG system with the hopes of reducing support call rates. Instead, the rate increased. Callers who tried the AI-powered search were given incorrect answers with a higher level of confidence and described customer support as angrier than before.

This pattern is repeated again and again. In healthcare, customer-facing AI assistants are providing patients with formal information that is weeks or months out of date. Financial services chatbots are spitting out answers from both retail and institutional product content. Retailers are seeing discontinued products in product searches.

The issue is not about the failure of AI technology. This is a failure of architecture

Why do standard RAG architectures fail?

The standard RAG pattern – embedding the query, retrieving semantically similar content, passing to the LLM – works beautifully in demos and proof of concepts. But in production use cases it falls apart for three systematic reasons:

1. Difference of intention

Intent is not context. But standard RAG architectures do not account for this.

What does it mean to tell a customer “I want to cancel”? Cancel a service? Cancel an order? Cancel an appointment? During our telecom deployment, we found that 65% of queries for “cancel” were actually about orders or appointments, not service cancellations. The RAG system had no way of understanding this intent, so it continuously returned service cancellation documents.

intention matters. In health care, if a patient is typing “I need to cancel” as they’re trying to cancel an appointment, refill a prescription, or cancel a procedure, taking them from scheduling to medication stuff is not only frustrating — it’s also dangerous.

2. Context flood

The enterprise knowledge and experience is vast, covering dozens of sources such as product catalogs, billing, support articles, policies, promotions, and account data. The standard RAG model treats all of these equally, searching them all for each query.

When a customer asks “how do I activate my new phone,” they don’t care about billing FAQs, store locations, or network status updates. But a standard RAG model retrieves semantically similar content from every source, and returns search results that are half a step away from the target.

3. Freshness Blindspot

The vector space is timeblind. Semantically, last quarter’s promotion is identical to this quarter’s. But presenting old offers to customers breaks trust. We linked a significant percentage of customer complaints to search results that surfaced expired products, offers or features.

Intent-First Architecture Pattern

The Intent-First Architecture pattern is a mirror image of the standard RAG deployment. In the RAG model, you retrieve, then route. In the intent-first model, you classify before routing or retrieving.

Intent-first architecture uses a lightweight language model to parse a query for intent and context, before sending it to the most relevant content sources (documents, APIs, agents).

Comparison: Intent-First vs. Standard RAG

Cloud-native implementation

The intent-first pattern is designed to leverage cloud-native deployments, microservices, containerization, and elastic scaling to handle enterprise traffic patterns.

intent classification service

The classifier determines the user’s intent before any retrieval:

Algorithm: Intent Classification

Input: user_query(string)

Output: intent_result(object)

1. Preprocess queries (normalize, expand contraction)

2. Classify using Transformer Model:

– primary_intent ← model.prediction(query)

– Confidence ← model.confidence_score()

3. If confidence is <0.70 then - return { Need clarification: True, Suggested_Question: generate_clarification_query } 4. Extract sub-intents based on primary_intent: - If primary = "Account" → Check ORDER_STATUS, Profile etc. - If primary = "Help" → Check DEVICE_ISSUE, NETWORK etc. - If primary = "Billing" → Check payments, disputes etc. 5. Determine target_source based on intent mapping: - order_status → [orders_db, order_faq] - device_issue → [troubleshooting_kb, device_guides] - medicine → [formulary, clinical_docs] (Health care) 6. return { primary_intention, sub_intention, Self-confidence, target_source, Requires Personalization: True/False }

Context-aware retrieval service

Once the intent is classified, retrieval is targeted:

Algorithm: context-aware retrieval

Input: query, intent_result, user_context

Output: rank_document

1. Get the source_config for intent_result.sub_intent:

– primary_source ← Sources to find

– Excluded_Sources ← Excluded Sources

– freshness_days ← maximum content age

2. If the intent requires personalization and the user is authenticated:

– get account_reference from account service

– IF INTENT=ORDER_STATUS:

– Get recent orders (last 60 days)

– add to results

3. Create search filters:

– content_types ← primary_sources only

– max age ← freshness_days

– user_context ← account_context (if available)

4. For each source in primary_sources:

– Document ← vector_search(query, source, filter)

– Add document to results

5. Score each document:

– Relevance_Score ← Vector_Similarity × 0.40

– recency_score ← freshness_weight × 0.20

– personalization_score ← user_match × 0.25

– intent_match_score ← type_match × 0.15

– total_score ← sum of above

6. Descending rank according to total_score

7. Return top 10 documents

Health Care-Specific Considerations

In health care deployments, the intent-first pattern includes additional security measures:

Health Care Intent Categories:

  • clinical: Medication related questions, symptoms, care instructions

  • coverage: Benefits, Prior Authorization, Formulary

  • Determination: Appointments, Provider Availability

  • Billing: Claims, Payments, Details

  • Account: Profile, Dependents, ID Card

Important Security: Clinical questions always include a disclaimer and are never a substitute for professional medical advice. The system relays complex clinical questions to human support.

handling edge cases

Marginal cases are those where systems fail. Typical handlers in the Intent-First pattern include:

Frustration detection keywords:

  • Anger: "Horrible," "worst," "Hatred," "ridiculous"

  • Time: "hours," "Day," "still waiting"

  • failure: "Useless," "no help," "doesn’t work"

  • Growth: "talk to a human being," "real person," "Manager"

When desperation is detected, abandon the search altogether and turn to humanitarian assistance.

Cross-Industry Applications

The intent-first pattern applies where enterprises deploy conversational AI on heterogeneous content:

Industry

intent categories

main benefits

telecommunication

Sales, Support, Billing, Account, Retention

protects from "Cancel" misclassification

Health care

Clinical, Coverage, Scheduling, Billing

separates clinical from administrative

financial Services

Retail, Institutional, Lending, Insurance

Context prevents mixing

retail

Products, Orders, Returns, Loyalty

ensures promotional freshness

Result

After implementing intent-first architecture on telecommunications and healthcare platforms:

metric

Effect

query success rate

almost doubled

support growth

reduced by more than half

solution time

approximately 70% reduction

user satisfaction

about 50% improvement

return user rate

more than double

Return user rate proved to be most important. When search works, users come back. When this fails, they abandon the channel altogether, increasing the cost of all other support channels.

strategic imperative

The conversational AI market will continue to experience tremendous growth.

But enterprises building and deploying specific RAG architectures will continue to fail time and again.

AI will confidently give wrong answers, users will leave digital channels due to frustration and support costs will increase rather than decrease.

Intent-first is a fundamental shift in how enterprises organize and build AI-powered customer conversations. It’s not about better models or more data. It’s all about understanding what the user wants before you try to help them.

The sooner an organization realizes this as an architectural imperative, the sooner they will be able to realize the efficiency gains this technology enables. Those that don’t will be discussing why their AI investments are not delivering the expected business results for years to come.

Demo is easy. Production is difficult. But the pattern of production success is clear: Intention first.

Srinivas Reddy Hulebidu Reddy is a leading software engineer and enterprise architect



<a href

Leave a Comment