Nvidia’s Cosmos Reason 2 aims to bring reasoning VLMs into the physical world

crimedy7 illustration of robots learning in a school ar 169 ff89c646 4604 4650 88d5 14713adf2cdc 3
Nvidia CEO Jensen Huang said last year that we are now entering the era of physical AI. While the company continues to offer LLMs for software use cases, Nvidia is increasingly positioning itself as a provider of AI models for fully AI-powered systems – including agentic AI in the physical world.

At CES 2026, Nvidia announced a slate of new models designed to push AI agents beyond chat interfaces and into physical environments.

Nvidia launched Cosmos Reason 2, the latest version of its vision-language model designed for embedded reasoning. Cosmos Reason 1, released last year, introduced a two-dimensional ontology for embodied logic and currently leads Hugging Face’s physical logic for video leaderboards.

Cosmos Reason 2 is based on the same ontology, providing greater flexibility for enterprises to customize applications and enabling physical agents to plan their next actions just as software-based agents reason through digital workflows.

Nvidia also released a new version of Cosmos Transfer, a model that lets developers generate training simulations for robots.

Other visual-language models, such as Google’s PolyGemma and Mistral’s Pixtral Large, can process visual input, but not all commercially available VLMs support logic.

“Robotics is at a turning point. We are moving from specialist robots limited to single tasks to generalist expert systems,” said Kari Brisky, Nvidia’s vice president for generic AI software, in a briefing with reporters. She was referring to robots that combine broad foundational knowledge with deeper task-specific skills. “These new robots combine broad fundamental knowledge with deep proficiency and complex tasks.”

He said Cosmos Reason 2 “enhances the reasoning capabilities that robots need to navigate in an unpredictable physical world.”

move to physical agents

Brisky said Nvidia’s roadmap follows “the same pattern of assets across all of our open models.”

“Creating specialized AI agents, a digital workforce, or the physical embodiment of AI in robots and autonomous vehicles requires much more than models,” Brisky said. “First, training AI requires compute resources to simulate the world around it. Data is the fuel to learn and improve AI and we contribute to the world’s largest collection of open and diverse datasets, which goes beyond simply weighing the models. Open libraries and training scripts give developers the tools to purposefully tailor AI to their applications, and we publish blueprints and examples to help deploy AI as modeled systems. Let’s do it.”

The company now has open models for physical AI in robotics, notably Cosmos, with its open-reasoning vision-language-action (VLA) model Gr00t and its Nemotron model for agentic AI.

Nvidia is making the case that open models across different branches of AI create a shared enterprise ecosystem that provides data, training, and reasoning to agents in both the digital and physical worlds.

Additions to the Nemotron Family

Brisky said Nvidia plans to continue expanding its open models, including its Nemotron family, beyond logic to include a new RAG and embedding model to make information more easily available to agents. The company released Nemotron 3, the latest version of its agentic reasoning model, in December.

Nvidia announced three new additions to the Nemotron family: Nemotron Speech, Nemotron RAG, and Nemotron Safety.

In a blog post, Nvidia said Nemotron Speech provides “real-time low latency speech recognition for live caption and speech AI applications” and is up to 10 times faster than other speech models.

Nemotron RAG technically consists of two models: an embedding model and a rerank model, both of which can understand images to provide more multimodal insights that data agents will tap into.

“The Nemotron RAGs, which we call MMtab or Massive Multilingual Text Embedding Benchmarks, are top-notch with strong multilingual performance while using less computing power memory, so they are suitable for systems that need to handle a lot of requests very quickly and with low latency,” said Brisky.

Nemotron security detects sensitive data so that AI agents do not accidentally expose personally identifiable data.



<a href

Leave a Comment