AI and Cloud Costs – by Aditya Patadia

Many companies are facing high AI costs. Uber burnt down The entire year’s AI budget is due in just 4 months and Microsoft, Salesforce, and Github are taking steps to reduce AI spending by employees.

On the other hand, AI is making many programming tasks much easier and continues to help in other domains like data interpretation, creating beautiful slides, and designing apps and websites. Currently, large AI labs have what we call frontier models and those models perform exceptionally well for a wide variety of tasks. Frontier AI labs are doing both the research and hosting on their own and hence, those models cost the most. For example, GPT 5.5 costs $5 per million input tokens and $30 per million output tokens. This is the most expensive model currently available openrouter. To give an example, fixing TypeScript types in 50 files with this model cost me $54 this afternoon.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b10bb3 234b 43c3 aa85

Model performance plateaus, open weight model releases, chip and model improvements, zero switching costs, and local models are reasons why AI labs may not be able to maintain the high price they are asking for now.

We are seeing improvements with each model release these days but it is clear that the improvements are becoming fewer and fewer. Unless an entirely new breakthrough is invented, current learning and inference capabilities can only be extended so much. There is also a problem of training data. Most AI labs have probably covered everything available in digital and print media for model training. Improving the training dataset is going to prove very difficult.

This means that it will not be easy to continue increasing model prices due to better performance. We saw proof of this where Cloud Opus 4.8 is priced the same as Cloud Opus 4.7. Once models stop improving at scale and training data and methods are the same, model prices will likely fall due to competition.

When OpenAI launched ChatGPT in 2022, it had a huge lead, but that lead is gradually diminishing and we saw Anthropic taking the top spot in 2025-26. Now models like GLM-5.2, which is an open-weight model, beat GPT and Opus in coding benchmarks. That model costs 1/10th that of the GPT 5.5.

What is happening here is that leading AI labs are charging not only for inference but also for research into model architecture, training data collection and curation, model training costs (which can be tens or even millions of dollars), paying their staff, and recovering marketing costs.

On the other hand, once an open weight model is released, any estimation provider can easily host it and make some markup on the estimation cost. This proves to be much cheaper than running a Frontier AI lab.

Cerebras, Grok, Google, and many other companies have realized that AI needs its own silicon and generic GPUs aren’t cutting it. Designing specific chips is very expensive but once the architecture is ready, it becomes easy to make millions of them and the estimated cost becomes very cheap. For example, a TPU can be 30-70% cheaper than an Nvidia H100 GPU. This kind of progress will keep coming and the price per token will keep falling.

The model architecture is also evolving. We saw caching as a fundamental improvement and now MoE models and other approaches are making models faster while maintaining the same accuracy level.

Windows OS, MS Office, Adobe Suite and traditional software like SaaS like Salesforce, HubSpot and Figma had a very significant drawback which AI models do not have. Not every software created was interchangeable. You can’t swap CRMs in an afternoon; It took months.

When more AI labs enter the space and more open weight models become available, this factor will be responsible for a very rapid price drop. AI gateway providers like OpenRouter.ai are making it extremely easy to switch models. This can happen in a matter of seconds and in fact, we can program it to change providers instantly. Zero switching cost means that if a better model comes along, consumers can switch to it without any time investment.

Last but not least, and actually the most important factor, is the ability of users to run local models. By now, almost everyone is using cloud-hosted models and on-premise models are either too large to deploy or too slow to work with. With advances in chips, this will change in 4-5 years. New chips will run models locally and an almost certain drop in RAM prices will make it easier to deploy models on computers and smartphones. My guess is that most operating systems will provide a way to deploy a model and they will also provide an interface so that apps running locally can connect to the model.

When this happens, the cloud model will be used for only the most complex tasks and simpler tasks like code tab completion, proofreading, and fact checking will be performed locally. This means customers no longer need that $20 or $200 subscription.

On a personal level, this is my first blog and I have made some bold predictions here. Only time will tell what their outcome will be but one thing is certain. Price pressure will come from one or more of the reasons listed above and in the end, this is good for consumers.



<a href

Leave a Comment