Nvidia's new open weights Nemotron 3 super combines three different architectures to beat gpt-oss and Qwen in throughput

Gemini Generated Image g3yqe1g3yqe1g3yq
Multi-agent systems, designed to handle long-horizon tasks like software engineering or cybersecurity triaging, can generate up to 15 times the token volume of standard chat – jeopardizing their cost-effectiveness in handling enterprise tasks.

But today, Nvidia asked for help solving this problem Release of Nemotron 3 SuperA 120-billion-parameter hybrid model with weights posted on the hugging face.

By merging disparate architectural philosophies—state-space models, transformers, and a novel "latent" Blend of Expertise Design – Nvidia is striving to provide the specialized depth needed for agentive workflows, without the typical bloat of dense logic models, and all while being available for commercial use under a mostly open weight.

Triple Hybrid Architecture

At the core of the Nemotron 3 Super is a sophisticated architectural triad that balances memory efficiency with precise logic. The model uses a Hybrid Mamba-Transformers BackboneWhich combines Mamba-2 layers with strategic transformer attention layers.

To understand the implications of enterprise production, consider "needle in a haystack" crisis. Mamba-2 layers act like one "fast travel" Highway systems handle the vast majority of sequence processing with linear-time complexity. This allows the model to maintain a huge 1-million-token context window without exploding the memory footprint of the KV cache. However, pure state-space models often struggle with associative recall.

To fix this, Nvidia strategically inserts Transformer attention layers "global anchor," Ensuring that the model can accurately retrieve specific facts hidden deep within a codebase or stack of financial reports.

Beyond the spine, the model introduces Latent Mix of Experts (LatentMoE). Traditional mixture-of-experts (MOE) designs root tokens for experts in their full latent dimension, which poses a computational bottleneck as the model scales. LatentMoE solves this by interpolating tokens into a compressed space before sending them to experts.

it "expert compression" Allows the model to consult four times more experts for the exact same computational cost. This granularity is important for agents who must switch between Python syntax, SQL logic, and conversational logic in a single step.

Multi-token prediction (MTP) is further accelerating the model. While standard models predict one next token, MTP predicts multiple future tokens simultaneously. It acts as a "built-in draft model," Enabling native speculative decoding that can provide up to 3x wall-clock speedup for structured generation tasks such as code or tool calls.

blackwell benefits

For enterprises, the most significant technological leap forward in Nemotron 3 Super is its optimization for the Nvidia Blackwell GPU platform. By originally pre-training in NVFP4 (4-bit floating point), Nvidia has achieved a breakthrough in production efficiency.

On Blackwell, the model produces inferences up to 4 times faster than 8-bit models running on the previous Hopper architecture, with no loss in accuracy.

In practical demonstration, the Nemotron 3 is a specialized device for super-agentic reasoning.

It is currently ranked No. 1 on the Deep Research Benchmark, a benchmark measuring AI’s ability to perform thorough, multi-step research across large document sets.

benchmark

nemotron 3 super

Qwen3.5-122B-A10B

GPT-OSS-120B

general knowledge

mmlu-pro

83.73

86.70

81.00

logic

AIME25(no device)

90.21

90.36

92.50

HMMT Feb 25 (no device)

93.67

91.40

90.00

HMMT February 25 (with equipment)

94.73

89.55

GPQA (no device)

79.23

86.60

80.10

GPQA (with tools)

82.70

80.09

LiveCodeBench (v5 2024-07↔2024-12)

81.19

78.93

88.00

sciencecode (subfunction)

42.05

42.00

39.00

HLE (no device)

18.26

25.30

14.90

HLE (with equipment)

22.82

19.0

agent

Terminal Bench (Hard Subset)

25.78

26.80

24.00

Terminal Bench Core 2.0

31.00

37.50

18.70

SWE-Bench (Openhand)

60.47

66.40

41.9

SWE-Bench (Opencode)

59.20

67.40

SWE-Bench (Codex)

53.73

61.20

SWE-Bench Multilingual (OpenHand)

45.78

30.80

Taubench V2

airline

56.25

66.0

49.2

retail

62.83

62.6

67.80

telecommunication

64.36

95.00

66.00

average

61.15

74.53

61.0

BrowseComp with Search

31.28

33.89

bird bench

41.80

38.25

Chat and follow instructions

IFBENCH (prompt)

72.56

73.77

68.32

Scale AI Multi-Challenge

55.23

61.50

58.29

arena-hard-v2

73.88

75.15

90.26

long story

AA-LCR

58.31

66.90

51.00

ruler @ 256k

96.30

96.74

52.30

ruler@512k

95.67

95.95

46.70

ruler @ 1m

91.75

91.33

22.30

multilingual

mmlu-prox (average over langs)

79.36

85.06

76.59

WMT24++(en→xx)

86.67

87.84

88.89

It also demonstrates significant throughput gains, achieving up to 2.2x more throughput than gpt-oss-120B and 7.5x more than Qwen3.5-122B in high-volume settings.

Custom ‘Open’ license – commercial use but with important warnings

The release of Nemotron 3 Super under the Nvidia Open Model License Agreement (updated October 2025) provides a permissive framework for enterprise adoption, although it has different features. "protect" Clauses that distinguish it from pure open-source licenses like MIT or Apache 2.0.

Key provisions for enterprise users:

  • Commercial Utility: The license clearly states that the models are "commercially usable" and grants a perpetual, worldwide, royalty-free license to sell and distribute products built on the model.

  • Ownership of Output: Nvidia makes no claims on the output generated by the model; Responsibility for—and ownership of—those outputs rests entirely with the user.

  • Derivative works: are free to create and own enterprises "derived model" (corrected versions), provided they include the required attribution notices: "Licensed by Nvidia Corporation under the Nvidia Open Model License."

"red line": :

Licenses include two important expiration triggers that production teams should monitor:

  1. Safety Railing: The license automatically expires if a user bypasses or disrupts the model "handrails" (technical limitations or security hyperparameters) without implementation "quite similar" Suitable replacement for the use case.

  2. Litigation Trigger: If a user files a copyright or patent lawsuit against Nvidia alleging that the model infringes their IP, their license to use the model is immediately terminated.

This structure allows Nvidia to foster a commercial ecosystem while protecting its own "ip trolling" And ensuring that security features of the model are not stripped away for malicious use.

‘The team really cooked’

The release has generated significant discussion within the developer community. Chris Alexiuk, a senior product research engineer at Nvidia, introduced the launch on X under his handle @llm_wizard As a "super day," Emphasis on speed and transparency of the model. "Model is: Fast. Model is: Smart. The model is: The most open model ever," Chris highlighted the release of 10 trillion tokens of not only weights, but also training data and recipes.

Industry adoption reflects this enthusiasm:

  • Cloud and Hardware: The model is being positioned as a nvidia nim microservicesAllows it to be run on-premises via Dell AI Factory Or hpeAs well as on Google Cloud, Oracle, and soon, AWS and Azure.

  • Production Agent: companies like code rabbit (Software Development) and grptile are integrating models to handle large-scale codebase analysis, while industry leaders prefer siemens And palantir It is being deployed to automate complex workflows in manufacturing and cybersecurity.

As Kari Brisky, Nvidia VP of AI software, said: "As companies move beyond chatbots to multi-agent applications, they face a context explosion."

The Nemotron 3 Super is Nvidia’s answer to that explosion – a model that offers "mental strength" Of 120B parameter system with operational efficiency of very small specialist. For enterprise, the message is clear: "after thinking" Finally coming down.



<a href

Leave a Comment