Corti's New Symphony For Speech-to-Text Model Beats OpenAI At Medical Terminology Accuracy, Highlighting The Value Of Specialized AI

Today, Copenhagen-based healthcare AI Corti is launching Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models specifically engineered for real-time dictation, conversation transcription and batch audio processing – and their accuracy rates are the highest ever recorded for this specific use case.

"We are focused on ensuring that our AI scribes can be trusted by physicians, medical professionals, and patients…the entire healthcare system." Andreas Cleve, co-founder and CEO of Corti, said in an exclusive video call interview with VentureBeat.

The performance data the company is putting out paints a clear picture of the current state of enterprise AI: When it comes to highly regulated, niche industries, domain-specific models can outperform foundation model providers.

In a newly published research paperCorti revealed that its new clinical-grade speech model has reduced the word error rate (WER) by 93% compared to the leading generalist speech model. and API on medical terminology.

On English medical terminology, it Symfony achieved a remarkably low 1.4% WER for speech-to-text. By comparison, OpenAI’s speech model recorded 17.7% WER, ElevenLabs surged 18.1%, Whisper recorded 17.4%And Parakeet scored 18.9%.

Corti’s announcement serves as an important inflection point for health care manufacturers. While general-purpose APIs like OpenAI’s Whisper are adequate for wide-domain transcription, they often stumble over medical abbreviations, complex drug dosages, shorthand, and noisy emergency room environments. Symphony for Speech-to-Text aims to solve this by providing developers with a highly specialized, production-grade API designed from the ground up for clinical workflows.

The agentic era demands lossless data input

The launch of Symfony for speech-to-text highlights how healthcare uses voice technology. For decades, medical speech recognition was primarily about producing a static text document for human doctors to review – a digital replacement for a notepad.

But as the healthcare industry calls on technologists, there’s interference "agentic era," Where autonomous AI agents actively assist in clinical decision making, EHR navigation, and real-time support, the transcript is no longer the end product. This is the basic data layer.

“Speech has always been one of the most important inputs to healthcare,” Cleve said in a statement to VentureBeat. “What is changing is what happens after the words are captured. In the agentic era, speech recognition requires more than just generating a transcript – we need to give AI systems accurate clinical facts to reason about. If a model misunderstands a medication, dosage or symptom, each downstream step becomes less reliable. Symphony for Speech-to-Text gives healthcare builders a speech layer that builds on clinical reality Precise enough to thrive.”

This is where the complex danger of high word error rates comes into play. If a general-purpose AI model hallucinates a transcription-turning "hyperthyroidism" In "hypothyroidism," Or misinterpreting the dosage of a critical drug – every next AI agent relying on that transcript will operate on corrupted data. Corti’s architecture minimizes this risk by producing structured, clinically usable output directly from the API, helping downstream AI applications reason on clean facts rather than messy, unformatted text.

Nowhere is this more evident than in Corti’s entity recall benchmark. Symphony reaches astonishing levels for speech-to-text -98.3% recall rate on formatted clinical unitsSuch as dosage, measurement and dates. On the contrary, Corti reported that the strongest general-purpose baseline model only maximized at 44.3% recall F.or similar institutions.

For developers building ambient AI documentation tools, that 54% difference is the difference between a tool that saves a physician time and a tool that constitutes medical liability.

dethrone industry

While Corti’s benchmarks against modern LLM builders like OpenAI and ElevenLabs are striking, the company is also taking aim at legacy medical transcription giants.

For years, the gold standard for dedicated physician dictation has been Dragon Medical One. However, these legacy systems were historically intentionally optimized for physician dictation, not as the underlying infrastructure for ambient AI, complex multi-party conversations, or real-time clinical support tools.

In a real-world English medical dictation evaluation, Corti achieved a 4.6% WER, outperforming Dragon’s 5.7% (19% relative improvement).

Additionally, Corti demonstrated higher medical term recall than Dragon (93.5% vs. 92.9%).

By providing this level of accuracy through API endpoints, Corti is enabling third-party developers, EHR vendors, and virtual care platforms to create their own custom dictation and ambient listening devices that outperform the industry’s legacy ones.

"We want people to build apps on top of our models," Cleve said. "The goal is to spread the technology as widely as needed so that it can be as useful as possible to patients and their doctors and professionals."

For Cleve and his co-founders, the mission is a personal one: Cleve’s own mother was a health care professional who was attacked by a patient and spent years struggling to recover. He sought to improve health care processes as a way to honor their sacrifice.

Solving the health care model puzzle

Health care demands extend far beyond English-speaking hospitals, and global health systems have historically been underserved by clinical NLP models. Early adopters are already taking advantage of Corti’s new model in linguistically demanding environments, proving the feasibility of the technology in complex international markets.

For example, Switzerland requires care delivery in multiple languages – often simultaneously within the same medical institution. It serves as one of the most rigorous provenance bases for multilingual medical speech models in the world. Corti’s Symphony model showed massive performance gains in these non-English tests, achieving 2.4% WER in German (versus 13.0% for the next-best system) and 3.9% WER in French (versus 10.6%).

“In clinical conversations, every word counts – a missed medication name, an incorrect dosage, or a misspelled symptom can change the meaning of an encounter," said Pierre Corboz, head of solutions and business development at VoicePoint, a Swiss healthcare technology provider. "Symphony’s accuracy on clinical terminology gives us the foundation to bring more reliable AI capabilities into clinical workflows with our VoicePoint Xenon platform. When Corti improves the speech layer, the workflow we create together becomes faster, safer and more useful for physicians in Switzerland.

Benefiting from AI verticalization and specialization

Today’s announcement of Symfony for speech-to-text is not an isolated incident; It is the culmination of a strategic narrative that Corti has been aggressively pushing for the past several weeks.

The comprehensive Symphony platform – which powers clinical and administrative applications for a global network of EHR vendors and life sciences organizations – is systematically proving the defensibility of vertical AI labs against horizontal tech giants.

This is the third major benchmark released by Corti in just six weeks, touching on different layers of healthcare AI performance.

In April, the company revealed that its Symphony for Medical Coding system outperformed general-purpose models by more than 25% in clinical accuracy benchmarks, tackling one of health care’s most notoriously complex workflows.

And just last week, Corti announced that its flagship clinical-grade model had outperformed OpenAI on HealthBench Professional, OpenAI’s own healthcare benchmark.

Taken together, these three data points – medical coding, clinical reasoning, and speech-to-text accuracy – reflect a growing consensus in the enterprise technology sector: generalized models are reaching a limit in regulated industries.

Models deployed in hospitals must naturally understand complex abbreviations, sudden interruptions, medical shorthand, specialty-specific language, and strict compliance constraints. By training specifically on these unique edge cases, vertical AI labs like Corti are building a formidable moat that companies relying solely on API calls to generalized large language models cannot easily cross.

Availability and product lineup

The developers are clearly paying attention to the performance difference. According to Momentum data provided to VentureBeat, Corti is seeing a 30% increase in new sign-ups for its platform in a quarter-to-date comparison, indicating that developers and healthcare builders are actively gravitating toward vertical, clinical-grade models rather than generic APIs.

Corti, which already serves more than 100 million patients annually across major health systems, including the UK National Health Service (NHS), is introducing Symphony for Speech-to-Text as the default engine for its next generation healthcare software.

It’s important to note that Corti isn’t launching a comprehensive Symfony platform today; Rather, Symfony for Speech-to-Text operates as a new, distinct capability within that broader ecosystem, accessible through its own API endpoint.

Symfony for speech-to-text is generally available starting today. Developers and enterprise architects can access the model through the Corti API console, with full technical documentation available to help integrate the clinical-grade speech layer into their existing applications.

Taking a step towards research transparency, Corti has also published its full research paper, detailing its methodology, as well as a separate comparison tool designed to support transparent evaluation of medical speech recognition systems across the industry.

As the healthcare industry continues to rapidly adopt AI-powered automation, the foundational data layer has never been more important. Corti’s latest launch is a stark reminder that in the medical field, general AI is not good enough. The future belongs to the experts.

<a href

Corti's new Symphony for Speech-to-Text model beats OpenAI at medical terminology accuracy, highlighting the value of specialized AI

The agentic era demands lossless data input

dethrone industry

Solving the health care model puzzle

Benefiting from AI verticalization and specialization

Availability and product lineup

Like this:

Related

Leave a Comment Cancel reply

The agentic era demands lossless data input

dethrone industry

Solving the health care model puzzle

Benefiting from AI verticalization and specialization

Availability and product lineup

Share this:

Like this:

Related

Leave a Comment Cancel reply