
Computers are extremely good at numbers, but they haven’t put many human mathematicians out of a job. Until recently, he could barely hold his own in high school level mathematics competitions.
But now Google’s DeepMind team has created AlphaProof, an AI system that matches the performance of silver medalists at the 2024 International Mathematical Olympiad, falling just one point short of a gold medal at the world’s most prestigious undergraduate mathematics competition. And this is a big thing.
true understanding
The reason computers perform poorly in math competitions is that, although they far exceed humanity’s ability to perform calculations, they are not really as good at the logic and reasoning required for advanced mathematics. Put differently, they are really good at calculating fast, but they are usually unable to understand why they are doing so. While something like addition seems simple, humans can do semi-formal proofs based on the definitions of addition or go for fully formal Peano arithmetic that defines operations like addition through the properties and principles of natural numbers.
To perform a proof, humans must understand the structure of mathematics. The way mathematicians construct proofs, the number of steps they take to reach a conclusion, and how cleverly they design those steps, is a testament to their genius, ingenuity, and mathematical elegance. “You know, Bertrand Russell published a 500-page book proving that one plus one equals two,” says Thomas Hubert, a DeepMind researcher and lead author of the AlphaProof study.
The team at DeepMind wanted to develop an AI that could understand mathematics at this level. The work started with solving a common AI problem: lack of training data.
translator of math problems
The large language models powering AI systems like Chat GPT learn from billions of pages of text. Because their training database contains texts on mathematics – all notebooks and works of famous mathematicians – they show some degree of success in proving mathematical statements. But they are limited in the way they operate: they rely on using giant neural nets to predict the next word or token in sequences generated in response to user signals. Their reasoning is statistical by design, meaning they only give answers that seem “right.”