Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

ChatGPT Image May 20 2026 03 40 32 PM
Canadian AI lab Foghere recently made a stir by announcing a merger with German AI startup Aleph Alpha, but now it has even more in store for enterprise builders around the world: Today, the firm is co-founded by former Googler and "all you need is attention" Co-author Aidan Gomez unveiled Command A+, a highly optimized, 218-billion-parameter language model specifically engineered for complex logic, multimodal document processing, and agentic workflows.

The most important aspect of the release isn’t just the model’s capabilities; This is its reach.

By releasing model weights for free on the popular AI code sharing repository Hugging Face under the highly permissive Apache 2.0 open-source license — a first for the company, according to a post on "sovereign ai"-The thesis that enterprises, governments, and developers must have the ability to run, control, and customize frontier-grade AI entirely within their own secure environments without compromising performance.

Sparse architecture with excessive quantization

At an architectural level, the Command A+ represents a major evolution from Fog’s previous compact models. It is a decoder-only sparse mixture-of-experts (MOE) transformer.

While the model has a relatively modest 218 billion total parameters, even fewer – only 25 billion – are active during any given generation phase. It has a much lighter footprint and requires far fewer compute resources to run inference (serving models through end users or agents in a production environment) than proprietary US giants like OpenAI’s GPT-5.5 and Anthropic’s Cloud Opus 4.7, which are estimated by third-party observers to be in the trillions of parameters.

This is the key to the efficiency of sparse architecture models. In simple terms, a MoE model routes incoming queries only to specific "expert" Neural networks are best suited to handle them, leaving the rest of the models idle.

This is a familiar formulation and is followed by most leading LLMs these days, allowing models to retain the vast knowledge base and subtle reasoning capabilities of a giant, but at the faster speed and lower computation and energy requirements of a much smaller model, because only a fraction of the parameters are active at any time.

But where Cohere has taken an extra step to the forefront for Command A+ is that it has focused heavily on hardware efficiency through quantization – a process that compresses models by reducing their memory footprint. accuracy Of its parameters.

Command A+ is available in 16-bit (BF16), 8-bit (FP8), and highly compressed 4-bit (W4A4) formats.

W4A4 quantization is the technical centerpiece of this release. Generally, reasoning models are largely influenced by "quantize," Where compressing the model leads to visible regression in complex problem-solving.

Kohre mitigated this by having MoE experts assign only 4-bit quantities, Whereas Keeping the important meditation passages with utmost accuracy, Complemented by a technique called Quantization-Aware Distillation.

result is one almost lossless compression Which allows this massive model to run on a single NVIDIA Blackwell B200 GPU or just two NVIDIA H100 GPUs.

The speed gain is equally notable. According to performance data released by the company, W4A4 quantization at low concurrency achieves 375 tokens per second (TOPS) with a time-to-first-token (TTFT) latency of only 113 milliseconds – which represents a 63% increase in output speed and a 17% reduction in latency compared to the previous Command A reasoning model.

Additionally, CoShares has overhauled the tokenizer model. Tokenizers break text into pieces that AI models process. The new Tokenizer is highly optimized for global enterprise use, including native support for 48 languages.

More importantly Dramatically improves tokenization efficiency for non-European languages, The number of tokens required to generate responses has been reduced by 20% in Arabic, 18% in Japanese, and 16% in Korean. Because inference costs are calculated per token, this simply means lower operating costs for global, multilingual or non-English deployments.

Agentic workflow and high standards on mathematics, specialized areas

While raw speed and size dictate deployment, a model’s usefulness is defined by its product capabilities. Command A+ was created specifically for "agentic" Task – Workflow where AI operates autonomously or semi-autonomously, uses external tools, queries databases, and synthesizes information in multiple stages.

The benchmark leaps over the previous generation are obvious.

On the 𝜏²-benchmark, which tests complex logic, the model increased from a 37% score to 85%. On terminal-bench hard, which measures agentive coding performance, it increased from 3% to 25%. In complex math, it scored 90%, up from 57% on AIME 25.

The Command A+ punches above its weight class (25B active parameters) in pure logic and math, competing directly with much larger models like the DeepSeek V4 Pro on math benchmarks. However, for deep agentic coding and general broad-scale intelligence indexing, it currently lags behind the latest generations of Chinese open source rivals such as DeepSeek, Z.AI (GLM), and MiniMax.

That said, their comparison simply ignores Foghere’s core value proposition: hardware efficiency.

Beyond benchmarks, Command A+ offers deep integration for enterprise trust and verification. The model supports the use of conversation tools through standard chat templates, allowing developers to seamlessly connect it to internal APIs, search engines, or SQL databases.

Importantly, Command A+ has a native citation generation feature. When Command A+ retrieves information from an external tool, it does not simply synthesize the answer; it produces clear "Grounding span." Using special tags embedded in output The model links each of its factual claims directly to a specific source document or database row. It pulled the information.

For enterprises in highly regulated industries like finance, healthcare, or legal, this traceability is the difference between an interesting prototype and a production-ready application. If a user asks for a daily sales report, the model will output the total sales amount and clearly cite the database query result that provided that number, reducing the risk of unknown hallucinations.

Additionally, Command A+ is fully multimodal, able to natively process both text and images within its massive 128K input context window, making it highly effective for complex document processing such as analyzing scanned invoices, charts or technical manuals.

The first fully Apache 2.0 licensed Cohere AI model

In the current AI scenario, "open source" It has become a scary word. Many leading AI companies release their model weights under restrictive commercial licenses or acceptable use policies that explicitly prevent large enterprises from using the models for commercial purposes, or from using the models to train competing AI systems.

In fact, Coher’s previous models, including the Command R and Command R+, were released under the CC-by-NC 4.0 (Creative Commons Non-Commercial) license. While their model weights were open for researchers and developers to download, tinker with, and evaluate, they were strictly prohibited from being used for commercial purposes without purchasing a separate enterprise license from Coherer or going through its application programming interface (API), similar to the arrangement used by many enterprises to access AI models from OpenAI, Anthropic, Google, and other leading labs.

Cohere has changed its approach by releasing Command A+ under the Apache 2.0 license. This is an important distinction for the developer community. Apache 2.0 is a true, OSI-approved open-source license. This allows anyone from independent developers to Fortune 500 corporations to use, modify, distribute, and commercialize the model without paying licensing fees or adhering to restrictive non-compete clauses.

As Gomez wrote on "The best model we have presented so far."

For the enterprise, this license means complete vendor freedom. A company can download Command A+ vets, fine-tune them on highly classified internal data, and deploy them on their own private servers or air-gapped networks. They are not tied to Cohere’s infrastructure, pricing changes or API uptime. This is the ultimate realization of sovereign AI.

The release received immediate impact in the AI ​​developer ecosystem, driven by its first-day integration with major open-source inference frameworks such as Hugging Face and VLLM.

What will happen next?

The release of Command A+ marks the maturing of the open-source AI ecosystem. By combining frontier-level reasoning, robust agentic tool usage, and multimodal capabilities with an architecture specifically designed for hardware efficiency, Cohere is changing the calculus for enterprise AI adoption.

The need for large-scale, centralized compute clusters has long been a stumbling block for companies prioritizing data privacy and cost control. By democratizing access to a model of this caliber under a true open-source license, Cohere has provided the enterprise market with exactly what it has been asking for: the power of the cloud, able to run safely in the server room down the hall.



<a href

Leave a Comment