
“Each frontier model we evaluated lost money over the course of the season and many experienced wastage,” the paper’s authors concluded, with AI “systematically outperforming humans” in this scenario.
| AI model | meaning ROI | best effort | worst attempt | means final bankroll |
|---|---|---|---|---|
| Anthropic Cloud Opus 4.6 | -11.0% | –0.2% | -18.8% | £89,035 |
| OpenAI GPT-5.4 | -13.6% | -4.1% | -31.6% | £86,365 |
| Google Gemini 3.1 Pro | -43.3% | +33.7% | -100.0% | £56,715 |
| Google Gemini Flash 3.1 LP | -58.4% | +24.7% | -100.0% | £41,605 |
| Z.AI GLM-5 | -58.8% | -14.3% | -100.0% | £41,221 |
| Moonshot KM K2.5 | -68.3% | -27.0% | -100.0% | £7,420 |
| XAI Grok 4.20 | -100.0% | -100.0% | -100.0% | £0 |
| acree trinity | -100.0% | -100.0% | -100.0% | £0 |
| Each model started with a £100,000 normalized bankroll. The return on investment and final bankroll are averaged over the three attempts. Grok and Trinity did not complete every attempt. | ||||
The results provide some relief to white-collar professionals and businesses who are worried that AI could take their jobs, as it impacts stocks across industries ranging from finance to marketing.
Ross Taylor, one of the study’s authors and chief executive of General Reasoning, said: “There is a lot of hype about AI automation, but there is no big measurement of how to put AI into a long-term horizon setting.”
He said that many of the benchmarks commonly used to test AI are flawed because they are set in “very stable environments” that bear little resemblance to the chaos and complexity of the real world.
The paper in General Reasoning, which has not yet been peer-reviewed, provides a counter to the growing excitement in Silicon Valley about the recent huge leap in AI’s ability to complete computer programming tasks with little or no human intervention.
Former Meta AI researcher Taylor said: “If you… try AI in some real-world tasks, it performs really poorly… Yes, software engineering is very important and economically valuable, but there are many other activities with longer time horizons that are important to focus on.”
© 2026 The Financial Times Ltd. All rights reserved. May not be redistributed, copied, or modified in any way.
<a href