I Know We’re in an AI Bubble Because Nobody Wants Me 😭 « Pete Warden’s blog

I first got involved in deep learning in 2012, when AlexNet came out. I was the CTO of Jetpack, a startup whose purpose was to provide information about bars, hotels, and restaurants by analyzing public photos, for example finding hipster (and turk) friendly cafes. The results of the paper were so surprising that I knew AlexNet would be incredibly helpful, so I spent my Christmas holidays heating up my house using a gaming rig with two GPUs and CudaConvNet software, because it was the only way to train my own version of the model.

The results were even better than I expected, but then I faced the problem of how to apply the model to the billions of photos we collected. The only GPU instances on Amazon were designed for video streaming and were extremely expensive. The CPU support in the Caffe framework was promising, but it was focused on training models, not running them after training (aka inference). I needed software that would let me run models at large scale on low-cost hardware. This was the original reason I wrote the Jetpack framework, so I could spin up hundreds of cheap EC2 instances to process our huge backlog of images for thousands of dollars instead of millions.

It turned out that the code was small and fast enough to run on phones, and after Jetpack was acquired by Google I continued work in that direction by leading mobile support for TensorFlow. Although I love edge devices, and that’s what I’m known for these days, my real passion is for efficiency. I learned to code in the demo scene of the 80s, started writing PC game engines professionally in the 90s, and I became addicted to the dopamine rush of optimizing inner loops. There’s nothing like spending the day solving puzzles with tight constraints, clear requirements, and getting a little more speed out of a system.

If you’re not a programmer, it may be hard to imagine what optimizing an emotional process might entail. There’s no guarantee that it’s even possible to get a good answer, so the process itself can be extremely frustrating. The first thrill comes when you see an opening, a possibility that no one else has seen. There’s the satisfaction of working hard to pursue an opportunity, and then there’s also the disappointment at times when it doesn’t work out. Still, it means I learned something, and being good at optimization means learning everything about hardware, operating systems, requirements, and studying others’ code in depth. I can never guarantee I’ll find a solution, but my consolation is always that I have a better understanding of the world than when I started. The deepest satisfaction comes when I finally find an approach that runs faster, or uses fewer resources. It’s also a social pleasure, it almost always contributes to a broader solution the team is working on, makes a product better, or even makes it possible in a way that wasn’t there before. The best optimization comes from a full stack team that is able to make tradeoffs, from product managers to model architects, all the way from hardware to operating systems to software.

Anyway, there’s been a lot of talk about the joy of coding, what does this have to do with the AI ​​bubble? When I look around, I see hundreds of billions of dollars spent on hardware – GPUs, data centers and power stations. I don’t see people waving big checks at ML infrastructure engineers like me and my team. It’s been an uphill battle to raise the investment needed for Moonshine, and I don’t think it’s just because I’m a better coder than I am a salesman. Thankfully, we have found investors who believe in our vision, and we are on track to be cash flow-positive in the first quarter of 2026, but in general I don’t think many startups will be able to raise money on the promise of improving AI efficiency.

It doesn’t make any sense to me from any rational economic standpoint. If you’re a tech company spending billions of dollars per month on GPUs, wouldn’t it be a good bet to spend a few tens of millions of dollars per year on software optimization? We know that GPU utilization is typically less than 50%, and in my experience it is often much lower for interactive applications where batches are small and memory-bound decoding dominates. We know that motivated engineers like Scott Gray can do better work than Nvidia’s library on their own GPUs, and from my experience at Jetpack and Google I’m convinced that there are plenty of opportunities to run inference on very low-cost CPU machines. Even if you don’t care about cost, the impact AI power use has on us and the planet should make it a priority.

So, why is this money being spent? As far as I can tell, this is because of the symbolic benefits to the people making the decisions. Startups like OpenAI are tempted to brag about how many GPUs they are buying, suggesting that they will be the top AI company for years to come because no one will be able to match their edge in terms of compute power. Hardware projects are also much easier to manage than software, they do not require so little attention to management. Investors are on board because they’ve seen first-hand early success translate into long-term dominance, it’s clear that AI is a world-changing technology so they need to be a part of it, and OpenAI and others are happy to absorb billions of dollars of investment, making VCs’ jobs much easier than if they had to allocate across hundreds of smaller companies. No one ever got fired for buying IBM, and no one ever got fired for investing in OpenAI.

I’m picking on OpenAI here, but across the industry you can see everyone from Oracle to Microsoft claiming to spend a lot of money on hardware, and for the same reasons. This gets them a lot of positive coverage and a huge increase in share price, as much as they announce they are hiring a thousand engineers to get more value out of their existing hardware.

If I am right then this spending is not sustainable. I was in the tech industry during the dot com boom, and I saw a similar dynamic with Sun Workstations. For a few years every startup needed to raise millions of dollars just to launch a website, as the only real option was to purchase expensive Sun servers and discontinued software. Then Google came along, and proved that using lots of cheap PCs running open-source software was cheaper and more scalable. Nvidia feels the same way these days as Sun did back then, and so I bet there will be a lot of chatbot startups based on cheap PCs with open source models running on CPUs over the next few years. Of course I made a similar prediction for 2023, and Nvidia’s valuation has quadrupled since then, so don’t look to me for stock tips!



<a href

Leave a Comment