Cohear is announcing Transcribe, a state-of-the-art automatic speech recognition (ASR) model that is open source and available today. for download.
Speech is rapidly becoming a core tool for AI-enabled workloads and automation – from transcription and speech analytics to real-time customer support agents.
Our objective was simple: to push the limits of dedicated ASR model accuracy under practical conditions. The model was trained from the start with a deliberate focus on minimizing the word error rate (WER), keeping production readiness in mind. In other words, not just a research artifact, but a system designed for everyday use.
Cohere Transcribe reflects that intention. It is available for open-source use with full infrastructure control, maintains a manageable inference footprint suitable for practical GPU and local use, provides best-in-class service efficiency, and is also available through model vault – Cohere’s secure, fully managed model estimation platform.
Cohere Transcribe is currently ranked #1 for accuracy on HuggingFace Open ASR LeaderboardSetting a new benchmark for real-world transcription performance.
This marks our zero-to-one in bringing high-performance speech recognition to enterprise AI workflows. Read on for more details.
model overview
| Name | coherent-transcript-03-2026 |
|---|---|
| architecture | Conformer-Based Encoder-Decoder |
| input | Audio Waveform → Log-Mel Spectrogram |
| Production | written text |
| model size | 2b |
| Sample | A large conformer encoder extracts acoustic representations, followed by a lighter conformer decoder for token generation. |
| training objectives | Standard supervised cross-entropy on output tokens; trained from the beginning |
| Languages |
Trained on 14 languages:
|
| license | Apache 2.0 |
Image 1: Cohear Transcribe is an open-weight conformer ASR model that converts speech audio to text in 14 supported languages.
model display
accuracy
CoHire Transcribe is the latest standard for English speech recognition accuracy. It leads the HuggingFace Open ASR leaderboard with an average word error rate of only 5.42%, outperforming all open- and closed-source dedicated ASR alternatives, including Whisper Large v3, ElevenLabs Scribe v2, and Qwen3-ASR-1.7b. It captures the model’s versatile ability in real-world speech tasks, such as robustness to multiple-speaker environments, boardroom-style acoustics (e.g. AMI dataset), and diverse utterances (e.g. Voxpopuli dataset).
| Sample | Average WER | ami | earning 22 | gigaspeech | ls clear | ls other | spg speech | tedium | Voxpopuli |
|---|---|---|---|---|---|---|---|---|---|
| coherent transcription | 5.42 | 8.13 | 10.86 | 9.34 | 1.25 | 2.37 | 3.08 | 2.49 | 5.87 |
| zoom scribe v1 | 5.47 | 10.03 | 9.53 | 9.61 | 1.63 | 2.81 | 1.59 | 3.22 | 5.37 |
| IBM Granite 4.0 1B Speech | 5.52 | 8.44 | 8.48 | 10.14 | 1.42 | 2.85 | 3.89 | 3.10 | 5.84 |
| nvidia canary queen 2.5b | 5.63 | 10.19 | 10.45 | 9.43 | 1.61 | 3.10 | 1.90 | 2.71 | 5.66 |
| Qwen3-ASR-1.7B | 5.76 | 10.56 | 10.25 | 8.74 | 1.63 | 3.40 | 2.84 | 2.28 | 6.35 |
| elevenlabs scribe v2 | 5.83 | 11.86 | 9.43 | 9.11 | 1.54 | 2.83 | 2.68 | 2.37 | 6.80 |
| Kyutai STT 2.6B | 6.40 | 12.17 | 10.99 | 9.81 | 1.70 | 4.32 | 2.03 | 3.35 | 6.79 |
| OpenAI Whisper Large v3 | 7.44 | 15.95 | 11.29 | 10.02 | 2.01 | 3.91 | 2.94 | 3.86 | 9.54 |
| Voxtral Mini 4B Realtime 2602 | 7.68 | 17.07 | 11.84 | 10.38 | 2.08 | 5.52 | 2.42 | 3.79 | 8.34 |
Image 2: Hugging Face Open ASR Leaderboard as of 03.26.2026. It is a widely used, standardized benchmark that evaluates automatic speech recognition systems in curated datasets using word error rate (WER) as the primary metric, calculated on normalized context-hypothesis alignment, where lower WER indicates higher transcription fidelity. View Live Leaderboard Here.
Critically, these benefits are not limited to the benchmark dataset. We see the same state-of-the-art performance in human assessment, where trained reviewers assess transcription quality in real-world audio for accuracy, consistency, and usability. The consistency across both evaluation methods reinforces that Cohear Transcribe’s performance reliably translates from controlled trials to practical enterprise settings.


Flow
In production settings, ASR systems must operate under strict latency and throughput constraints; Regardless of whether accurate, slow or resource-intensive transcription can directly impact user experience, operational efficiency, and costs.
Transcribe 1B+ extends the Pareto frontier while providing state-of-the-art accuracy (low WER) while maintaining best-in-class throughput (high RTFX) within the parameter model group.

“We are really impressed with what Cohear has created with Transcribe. The speed is extraordinary – turning minutes of audio into a usable transcript in seconds – and it immediately opens up new possibilities for real-time products and workflows.
In our testing, the model handled everyday speech very well and delivered strong, reliable transcription quality. The overall experience has been intuitive and easy to work with. We are excited to partner with Cohair and continue exploring what we can create with this technology.
Paige Dickey Vice President Radical Ventures
Zero to one, and beyond.
We are working towards a deeper integration of Cohere Transcribe AnswerCohere’s AI agent orchestration platform. With planned updates, Cohear Transcribe will evolve from a high-accuracy transcription model into a comprehensive foundation of enterprise speech intelligence.
launch.
Cohere Transcribe is now available for download hugging face. Follow the setup instructions to run the model locally, or even in an edge environment.
You can also access Cohere Transcribe through our API Free, low-setup use subject to rate limits. see documentation For usage details and integration guidance.
For production deployments without rate limits, provision a dedicated model vault. This enables low-latency, private cloud inference without having to manage infrastructure. Pricing is calculated on an hourly rate, with discounted plans for longer-term commitments. Contact our team To discuss your requirements.
Major Contributors: Julian Mack (Member of Technical Staff), Ekagra Ranjan (Member of Technical Staff), Cassie Kao (Product Manager), Bharat Venkatesh (Manager of Technical Staff), Pierre Harvey Richmond (Manager of Technical Staff).
<a href