Meta returns to open source AI with Omnilingual ASR models that can transcribe 1,600+ languages natively

cfr0z3n graphic novel abstract expressionist outline style show 390da294 e50e 424d ab36 20b02133d3d9

Meta recently released a new multilingual automatic speech recognition (ASR) system supporting 1,600+ languages ​​– dwarfing OpenAI’s open source Whisper model, which supports just 99.

Does the architecture also allow developers to scale that support to thousands of people? Through a feature called zero-shot in-context learning, users can provide a few paired examples of audio and text in a new language at inference time, enabling the model to write additional utterances in that language without any retraining.

In practice, this expands the potential coverage to more than 5,400 languages ​​– almost every spoken language with a known script.

This is a shift from static model capabilities to a flexible framework that communities can customize themselves. So while 1,600 languages ​​represents the official training coverage, the broader figure represents the ability of omnilingual ASR to generalize on demand, making it the most expandable speech recognition system released to date.

The best part: It’s open sourced under the plain Apache 2.0 license – not a restrictive, quasi open-source Llama license like the company’s previous releases, whose use is limited by large enterprises unless they pay a license fee – meaning researchers and developers are free to pick up and apply it immediately, for free, with no restrictions, even in commercial and enterprise-grade projects!

Released on November 10 with a hugging face and a demo space on Meta’s website Github, Meta’s omnilingual ASR suite includes a family of speech recognition models, a 7 billion parameter multilingual audio representation model, and a massive speech corpus spanning over 350 previously underserved languages.

All resources are freely available under open licenses, and the models support speech-to-text transcription out of the box.

“By open sourcing these models and datasets, we aim to break down language barriers, expand digital access, and empower communities around the world,” Meta posted on its @AIatMeta account on Twitter.

Designed for speech-to-text transcription

At its core, omnilingual ASR is a speech-to-text system.

Models are trained to support applications such as converting spoken language to written text, voice assistants, transcription tools, subtitling, spoken collection digitization, and accessibility features for low-resource languages.

Unlike earlier ASR models, which require extensive labeled training data, omnibus ASR includes a zero-shot version.

This version can transcribe languages ​​it has never seen before – using audio and a few paired examples of related text.

This dramatically lowers the barrier to adding new or endangered languages, removing the need for a large corporation or retraining.

Model family and technical design

The omnilingual ASR suite includes multiple model families trained on over 4.3 million hours of audio in 1,600+ languages:

  • wav2vec 2.0 model for self-supervised speech representation learning (300M-7B parameters)

  • CTC-based ASR model for efficient supervised transcription.

  • LLM-ASR models combine speech encoders with Transformer-based text decoders for state-of-the-art transcription

  • LLM-ZeroShot ASR models enable inference-time optimization for unseen languages.

All models follow an encoder-decoder design: raw audio is converted into a language-agnostic representation, then decoded into written text.

Why does scale matter?

While Whisper and similar models have advanced ASR capabilities for global languages, they fall short on the long tail of human linguistic diversity. Whisper supports 99 languages. Meta’s system:

  • Supports 1,600+ languages ​​directly

  • Can generalize to 5,400+ languages ​​using context-based learning

  • Achieves less than 10% character error rate (CER) in 78% of supported languages

According to Meta’s research paper, the supported languages ​​include more than 500 languages ​​that have never been covered by any ASR model before.

This expansion opens up new possibilities for communities whose languages ​​are often excluded from digital tools

Here is the revised and expanded background section, which integrates the broader context of Meta’s 2025 AI strategy, leadership transition, and reception of Lama 4, with in-text citations and links:

Background: The meta’s AI overhaul and a rebound from Llama 4

The release of the omnilingual ASR comes at a key moment in Meta’s AI strategy, after a year marked by organizational turmoil, leadership changes and uneven product execution.

Omnilingual ASR is the first major open-source model release following the rollout of Meta’s latest large language model, Llama 4, which debuted in April 2025 to mixed and ultimately poor reviews, with much lower enterprise adoption than Chinese open source model competitors.

The failure led Meta’s founder and CEO Mark Zuckerberg to hire Alexander Wang, co-founder and former CEO of AI data supplier Scale AI, as chief AI officer and embark on an extensive and expensive recruitment drive that shocked the AI ​​and business communities with eye-watering pay packages for top AI researchers.

In contrast, the omnibus ASR represents a strategic and reputational reset. This returns Meta to the domain where the company has historically pioneered – multilingual AI – and offers a truly expandable, community-oriented stack with minimal barriers to entry.

The system’s support for over 1,600 languages ​​and its extensibility to over 5,000 languages ​​through zero-shot in-context learning reaffirms Meta’s engineering credibility in language technology.

Importantly, it does this through a free and permissive licensed release under Apache 2.0 with transparent dataset sourcing and reproducible training protocols.

This change aligns with the broader themes in Meta’s 2025 strategy. The company has re-centered its story around a “personal superintelligence” vision, investing heavily in source infrastructure (including custom AI accelerators and the September release of an Arm-based inference stack) while downplaying the metaverse in favor of basic AI capabilities. The return of public training data to Europe following a regulatory freeze also underlines its intention to compete globally regardless of the privacy investigation source.

The omnibus ASR, then, is more than a model release – it’s a calculated move to reassert control of the narrative: from Llama 4’s fragmented rollout to a high-utility, research-based contribution that aligns with Meta’s long-term AI platform strategy.

Community-Centric Dataset Collection

To achieve this scale, Meta partnered with researchers and community organizations in Africa, Asia, and elsewhere to create the Omnilingual ASR Corpus, a 3,350-hour dataset in 348 low-resource languages. Contributors were compensated local speakers, and recordings were collected in collaboration with groups such as:

  • African Next Voices: a Gates Foundation-supported consortium consisting of Maseno University (Kenya), the University of Pretoria and Data Science Nigeria

  • Common Voice of Mozilla FoundationSupported through the Open Multilingual Speech Fund

  • lanafrica/naijavoicewhich produced data for 11 African languages ​​including Igala, Serer and Urhobo

Data collection focused on natural, unwritten speech. Prompts were designed to be culturally relevant and open-ended, such as “Is it better to have a few close friends or many casual acquaintances? Why?” Transcription uses established writing systems, with quality assurance included at each stage.

Performance and Hardware Considerations

The largest model in the suite, omniASR_LLM_7B, requires ~17GB of GPU memory for inference, making it suitable for deployment on high-end hardware. Smaller models (300M-1B) can run on low-power devices and provide real-time transcription speeds.

Performance benchmarks show strong results even in low-resource scenarios:

  • CER <10% in 95% high-resource and mid-resource languages

  • 36% of low-resource languages ​​have CER <10%

  • Robustness in noisy conditions and invisible areas, especially with fine-tuning

The zero-shot system, omniASR_LLM_7B_ZS, can transcribe new languages ​​with minimal setup. Users provide some sample audio-text pairs, and the model generates transcriptions for new utterances in the same language.

Open Access and Developer Tooling

All models and datasets are licensed under the following terms:

  • Apache 2.0 For models and codes

  • CC-BY 4.0 For omnilingual ASR corpus on HuggingFace

Installation is supported via PyPI and uv:

pip install omnilingual-asr

Meta also offers:

  • A HuggingFace dataset integration

  • Pre-built Estimation Pipelines

  • Language-code conditioning for better accuracy

Developers can view the full list of supported languages ​​using the API:

from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs

print(len(supported_langs))
print(supported_langs)

wider implications

Omnilingual ASR changes the language coverage in ASR from a fixed list to one expandable frameit enables:

  • Community-driven inclusion of underrepresented languages

  • Digital access to oral and endangered languages

  • Research on speech techniques in linguistically diverse contexts

Importantly, META emphasizes ethical considerations as a whole – advocating open-source participation and collaboration with native-speaking communities.

Omnilingual ASR The paper states, “No model can predict or incorporate all the world’s languages ​​in advance,” but omnilingual ASR makes it possible for communities to extend validation with their own data.

access tools

All resources are now available at:

  • code + model: github.com/facebookresearch/omnilingual-asr

  • dataset:huggingface.co/datasets/facebook/omnilingual-asr-corpus

  • blog post: ai.meta.com/blog/omnilingual-asr

What does this mean for enterprises

For enterprise developers, especially those working in multilingual or international markets, omnilingual ASR significantly lowers the barrier to deploying speech-to-text systems across a wide range of customers and geographies.

Instead of relying on commercial ASR APIs that only support a narrow set of high-resource languages, teams can now integrate an open-source pipeline that covers more than 1,600 languages ​​– with the option to expand it to thousands more languages ​​through zero-shot learning.

This flexibility is especially valuable for enterprises working in areas such as voice-based customer support, transcription services, accessibility, education, or civic technology, where local language coverage may be a competitive or regulatory requirement. Because the models are released under the permissive Apache 2.0 license, businesses can fix, deploy, or integrate them into proprietary systems without restrictive terms.

It also represents a shift in the ASR landscape – from a centralized, cloud-gated offering to a community-scalable infrastructure. By making multilingual speech recognition more accessible, customizable, and cost-effective, omnilingual ASR opens the door to a new generation of enterprise speech applications built around linguistic inclusion rather than linguistic limitation.



Leave a Comment