Nvidia’s Vera Rubin is months away — Blackwell is getting faster right now

blackwelll vs Vera smk
The big news this week from Nvidia, which made headlines in all types of media, was the company’s announcement regarding Vera Rubin GPUs.

This week, Nvidia CEO Jensen Huang used his CES keynote to highlight the performance metrics of the new chip. According to Huang, the Rubin GPU is capable of 50 PFLOPs of NVFP4 inference and 35 PFLOPs of NVFP4 training performance, representing 5x and 3.5x the performance of Blackwell.

But it won’t be available until the second half of 2026. So what should enterprises do now?

Blackwell keeps getting better

The current, shipping Nvidia GPU architecture is Blackwell, which was announced Hopper’s successor in 2024. With that release, Nvidia emphasized that its product engineering path also included squeezing as much performance as possible from the former Grace Hopper architecture.

It’s a direction that will also be true for Blackwell, with Vera Rubin coming out later this year.

"We continue to optimize our inference and training stack for the Blackwell architecture," Dave Salvator, director of accelerated computing products at Nvidia, told VentureBeat.

In the same week that Vera Rubin was being touted by Nvidia’s CEO as its most powerful GPU ever, the company published new Research Blackwell’s performance is showing improvement.

Blackwell’s performance has improved by 2.8 times

Nvidia was able to increase Blackwell GPU performance by up to 2.8x per GPU in a period of just three months.

The gains in performance come from a series of innovations that have been added to the Nvidia TensorRT-LLM inference engine. These optimizations are applied to existing hardware, allowing current Blackwell deployments to achieve higher throughput without hardware changes.

Performance gains are measured on DeepSeek-R1, a 671 billion parameter mixture-of-experts (MOE) model that activates 37 billion parameters per token.

Among the performance-boosting technological innovations:

  • Programmatic Dependent Launch (PDL): The extended implementation reduces kernel launch delay, increasing throughput.

  • All-to-all communication: The new implementation of communication primitives eliminates an intermediate buffer, thereby reducing memory overhead.

  • Multi-Token Prediction (MTP): Generates multiple tokens per forward pass instead of one at a time, increasing throughput at different sequence lengths.

  • NVFP4 Format: Blackwell features a 4-bit floating point format with hardware acceleration that reduces memory bandwidth requirements while preserving model accuracy.

Optimization reduces the cost per million tokens and allows existing infrastructure to serve higher request volumes at lower latency. Cloud providers and enterprises can expand their AI services without immediate hardware upgrades.

Blackwell has also made gains in training performance

Blackwell is also widely used as a fundamental hardware component for training large language models.

In that regard, Nvidia has also reported significant benefits for Blackwell when used for AI training.

Since its initial launch, the GB200 NVL72 system delivered 1.4x higher training performance on the same hardware – a 40% increase achieved in just five months without any hardware upgrades.

The training boost came from a series of updates, including:

  • Customized Training RecipesNvidia engineers have developed sophisticated training recipes that effectively take advantage of NVFP4 precision, Initial Blackwell submissions used FP8 precision, but the transition to NVFP4-optimized recipes unlocked substantial additional performance from the existing silicon,

  • Algorithm refinement. Continuous software stack enhancements and algorithm improvements enabled the platform to extract greater performance from similar hardware, demonstrating ongoing innovation beyond initial deployment.

Double-down on Blackwell or wait for Vera Rubin?

Salvator said the high-end Blackwell Ultra is a market-leading platform purpose-built to run cutting-edge AI models and applications.

he adds The NVIDIA Rubin platform will extend the company’s market leadership and enable the next generation of MOE to power a new class of applications to drive AI innovation even further.

Salvator explained that Vera Rubin was created to address the increasing demand in computation generated by the continued increase in model size and logical token generation from leading models such as MoE.

"Blackwell and Rubin may offer the same model, but the difference is performance, efficiency and token cost," He said.

According to Nvidia’s preliminary test results, compared to Blackwell, Rubin can train large MoE models in a quarter of the GPUs, infer token generation with 10X more throughput per watt, and infer 1/10th the cost per token.

"Improved token throughput performance and efficiency means new models can be built with greater reasoning capabilities and faster agent-to-agent interactions, creating better intelligence at lower costs." Salvator said.

What this means for enterprise AI builders

For enterprises deploying AI infrastructure today, the existing investment in Blackwell remains strong despite the arrival of Vera Rubin later this year.

Organizations with existing Blackwell deployments can immediately achieve a 2.8x inference improvement and a 1.4x training boost by updating to the latest TensorRT-LLM versions – providing real cost savings without the capital expense. For those planning new deployments in the first half of 2026, it makes sense to move forward with Blackwell. Waiting six months means delaying AI initiatives and potentially falling behind competitors already deployed today.

However, enterprises planning large-scale infrastructure construction for the end of 2026 and beyond should include Vera Rubin in their roadmap. A 10x improvement in throughput per watt and 1/10th the cost per token represents transformative economics for large-scale AI operations.

The smart approach is phased deployment: leverage Blackwell for immediate needs while architecting systems that can incorporate Vera Rubin when available. Nvidia’s continuous optimization model means it’s not a binary choice; Enterprises can maximize value from current deployments without sacrificing long-term competitiveness.



<a href

Leave a Comment