Alibaba Qwen Wins “NeurIPS 2025 Best Paper Award” for Breakthrough in Attention Mechanisms

shutterstock 2573567749shutterstock 2573567749

The Alibaba Quen team has received the prestigious “NeurIPS 2025 Best Paper Award” at the Conference on Neural Information Processing Systems (NeurIPS), one of the world’s foremost conferences in machine learning and artificial intelligence. The award recognizes the team’s pioneering research on attention mechanisms in large language models (LLM).

The winning paper, titled “Gated Attention for Large Language Models: Non-Linearity, Sparsity, and Attention-Sync-Free”, is the first paper in the industry to systematically investigate how attention gating affects the performance and training of large models.

NeurIPS 2025 Best Paper AwardNeurIPS 2025 Best Paper Award

Gating, a mechanism that controls the flow of information through the network, is one of the most widely used techniques in LLM architectures. Functioning like “intelligent noise-canceling headphones” for one model, it helps filter out irrelevant information and increases overall effectiveness.

To rigorously evaluate the role of gating, the Quen team conducted a comprehensive study that compared more than 30 variants of a 15B mixture-of-experts (MoE) model and a 1.7B dense model trained on a 3.5-trillion-token dataset. The research results show that a simple architectural modification – adding a head-specific sigmoid gate followed by scaled dot-product attention (SDPA) – consistently improves model performance. This modification increases training stability, allows larger learning rates, and improves scaling properties.

These findings have already been incorporated into the Quen3-Next model, released in September 2025, which introduced architectural innovations by replacing standard attention with a combination of Gated DeltaNet and Gated Attention. This design improves the capabilities of learning in context while increasing computational efficiency.

To aid further research and community adoption, the Quen team has already released the related code and models on Github and HuggingFace.

The NeuReps selection committee commented, “The main recommendation of the paper is easily implemented, and given the extensive evidence provided in the paper for this modification to the LLM architecture, we expect this idea to be widely adopted.”

“This paper represents a substantial amount of work that is only possible with access to industrial-scale computing resources, and the authors’ sharing of the results of their work, which will advance the community’s understanding of attention in large language models, is highly appreciated, especially in an environment where open sharing of scientific results around LLMs has been shunned.” Selection committee was added.



<a href=

Leave a Comment