This AI Model Can Intuit How the Physical World Works

original version Of this story appeared in Quanta Magazine.

Here’s a test for babies: Show them a glass of water on the desk. Hide it behind a wooden board. Now move the board towards the glass. Are they surprised if the board repeatedly passes through the glass as if it was not there? Many 6-month-olds, and by one year, almost all children have an innate perception of an object’s permanence, learned through observation. Now some artificial intelligence models do this too.

Researchers have developed an AI system that learns about the world through videos and displays a perception of “surprise” when presented with information that contradicts received knowledge.

The model, created by META and called Video Joint Embedding Predictive Architecture (V-JEPA), makes no assumptions about the physics of the world contained in the video. Nonetheless, it can begin to make sense of how the world works.

“Their claims are, a priori, very plausible, and the results are extremely interesting,” says Micah Heilbronn, a cognitive scientist at the University of Amsterdam who studies how the brain and artificial systems understand the world.

higher abstraction

As engineers building self-driving cars know, it can be hard to get an AI system to reliably understand what it sees. Most systems are designed to “understand” videos by either classifying their content (e.g. “a person playing tennis”) or identifying the shape of an object – e.g., a car ahead – in what is called “pixel space”. The model essentially gives equal importance to every pixel in the video.

But these pixel-space models come with limitations. Imagine trying to understand a suburban street. If the scene contains cars, traffic lights, and trees, the model may focus too much on irrelevant details such as the movement of leaves. It cannot see the color of traffic lights, or the position of nearby cars. “When you go to images or videos, you don’t want to work on [pixel] Space because there are a lot of details you don’t want to model,” said Randall Balestriero, a computer scientist at Brown University.

Image may contain Yann Lecan face, happy head, person, smile, photography, portrait, dimples, adult and accessories
Yann LeCun, a computer scientist at New York University and director of AI research at Meta, created JEPA, a predecessor to V-JEPA, in 2022, which works on static images.
Photograph: École Polytechnique University Paris-Saclay



<a href

Leave a Comment