Every six months, Nvidia’s automotive chief, Xinzhou Wu, invites CEO Jensen Huang for a ride in a vehicle equipped with the company’s hands-free autonomous driving system. But only if Wu has “good confidence” in the system’s driving capabilities.
Recently, the two went for a drive from Woodside, California, to downtown San Francisco in a Mercedes CLA sedan with MB.Drive Assist Pro, a hands-free driver-assist system partially designed by Nvidia that is similar to Tesla’s Full Self-Driving. The mood was light, even though the traffic was heavy.
“Let me know when you’re in autonomous mode,” Huang told Wu, according to a video of the ride provided. The Verge“Then I can be less worried about my safety.”
Over the course of the 22-minute video, the Mercedes leads Huang and Wu through a series of everyday obstacles, such as construction sites, double-parked cars, and narrow lanes winding through rows of orange cones. Nvidia’s system seems quite capable, although the video is edited and not rendered in real time. (Nvidia spokeswoman Jessica Soares later said there were no disruptions during the ride.)
Still, it wasn’t dissimilar to my own experience riding shotgun with Nvidia executives last year in a Mercedes with the hands-free driving system activated. I was impressed by the system’s ability to handle traffic signals, four-way stops, double-parked cars, unsafe left turns, and all the pedestrians and cyclists and scooter-riders coming into San Francisco. If Tesla can do it with a little silicon and a ton of cameras, it stands to reason that the world’s most valuable company can figure it out, too.
‘ChatGPT moment for physical AI’
After years of working behind the scenes, Nvidia is attempting to gain a more prominent leadership position on autonomous driving. Not only is it supplying chips to companies like Tesla, but it is also providing its AI-powered driving features to partners like Mercedes, Jaguar Land Rover and Lucid. At CES earlier this year, Huang unveiled Alpamayo, a portfolio of AI models, simulation blueprints and datasets that could give vehicles Level 4 autonomy, allowing them to completely drive themselves in specific circumstances. Huang described the announcement as a “chatty moment for physical AI”.
In the car with Wu, Huang is less bombastic and more introspective – but no less optimistic on the future of technology. “I think the challenge, undoubtedly, with Alpamayo is, as incredibly smart as it is – and it can reason about the situation – we don’t know what it can’t do,” he said. “And that’s the challenge, and that’s why our classical stack is so incredibly important.”
After years of working behind the scenes, Nvidia is attempting to gain a more prominent leadership position on autonomous driving
Huang claims that Nvidia’s approach to autonomous driving is “unique” because it combines end-to-end AI models with a traditional, human-engineered “classical” stack. He believes that pure end-to-end models are difficult to verify for security. In contrast, the classical stack follows well-established engineering protocols and procedures that make it easy to verify that certain behaviors are safe enough. By combining both approaches, Nvidia’s system can benefit from human-like driving styles while maintaining a safety framework based on traditional rules of the road.
Huang’s claim of a unique perspective on the industry does not entirely ring true; Other AV operators also use end-to-end neural networks with explicit safety rules that control how the vehicle should respond. But it is certainly true that end-to-end learning, which is more human and less robotic in its driving, is becoming more prevalent. Waymo relies on a hybrid system, while Tesla relies exclusively on end-to-end neural networks.
In an interview, Wu said that end-to-end models are able to respond better to things like speed bumps or lane changes without feeling mechanical or overly robotic. “That’s why this is truly a ChatGPET moment,” he said. “It’s like when your car actually drives with confidence… then basically customers will feel more inclined to use it.”
Tesla and the high cost of self-driving
I asked Wu what he thought of Nvidia’s approach compared to Tesla’s Full Self-Driving, which has driven more than 8.5 billion miles but has been mired in several troubling safety incidents, including 23 injuries and at least two deaths. Last December, an Nvidia executive told me that the company had tested the two systems against each other. The number of driver acquisitions for Nvidia’s systems was comparable, he said, sometimes favoring one system, sometimes the other.
Wu declined to comment directly on Tesla’s safety record, but pointed out that Nvidia differentiates itself through the use of multiple sensors, including cameras, radar, ultrasonic sensors and – at higher configurations – lidar. Nvidia believes that redundancy and diversity in sensing technologies is critical to handling difficult edge cases and achieving high levels of security, Wu said.
“It’s like when your car actually drives with confidence… then basically customers will feel more inclined to use it.”
– Xinzhou Wu
Extra sensors means extra cost. The inclusion of lidar in particular suggests that Nvidia’s safest systems will only be accessible to wealthy Mercedes owners. But Wu believes Nvidia’s vertically integrated approach allows it to deliver the required security performance at the lowest possible cost.
Nvidia’s DRIVE Hyperion platform is designed with multiple configurations in mind. The base version uses a simpler and more cost-effective sensor setup, relying primarily on cameras and radar. These sensors have become dramatically cheaper over the past decade due to mass production; Ultrasonic sensors are already extremely cheap. For higher levels of autonomy, the platform can add lidar sensors, and given the declining cost of lidar, Wu said he believes vehicles priced at $40,000 to $50,000 could realistically incorporate the full sensor stack needed for advanced autonomy.
Advantages and disadvantages of data
I asked Wu about recent safety incidents involving Waymo vehicles, such as the blocking of intersections by the company’s robotaxis during a blackout in San Francisco. He said Nvidia was already running similar edge cases through its simulators. In fact, the company relies heavily on synthetic driving data to account for its losses in real-world testing. Tesla has billions of driving miles in the real world thanks to its massive fleet of customer cars. Waymo has driven nearly 200 million fully autonomous miles on public roads. How can Nvidia ever hope to catch up?
“The big infrastructure game is really a simulation,” Wu said. Nvidia is taking two approaches to this. One is neural reconstruction, or NuRec, in which the company’s engineers recreate real-world driving scenarios using sensor data collected from vehicles in the field. The second is augmentation, which modifies elements within the reconstructed scene to explore different possible outcomes. This allows engineers to test how autonomous systems behave under slightly different conditions and identify rare edge cases that may have occurred in the original dataset.
“We can get a pedestrian out at different locations, at a faster rate, at a slower rate,” he said. “This is what we call blurring the dataset.”
Nvidia has acquired dashcam footage from its partners to feed the data used in the simulations. It also recreates edge cases from these Waymo events, like blackouts, and trains its system to respond without blocking intersections.
But the ultimate goal is to build a system that uses logic to avoid these edge-case traps – thus eliminating the need for real-world driving data in the first place. Wu’s team is working on something called the Vision Language Action Model, which will put this theory into practice. These models combine visual perception, language understanding, and physical action into a unified architecture, drawing on large base models already trained on Internet-scale datasets. Wu compares it to driver’s ads.
“When we teach a kid to drive, they read a manual and then practice driving for 20 hours,” Wu said. “In general, they are not bad drivers to begin with – although, obviously, it takes experience to improve. Ultimately, we want the model to work the same way: in the future, with just a rulebook and 20 hours of training data, it will learn how to drive.”
<a href