
From miles away across the desert, the Great Pyramid looks like a perfect, smooth geometry – a smooth triangle pointing towards the stars. Stand on the base, however, and the illusion of smoothness disappears. You see huge, jagged blocks of limestone. It’s not a slope; This is a ladder.
Remember this the next time you hear futurists talk about exponential growth.
Intel co-founder Gordon Moore (Moore’s Law) is quoted as saying in 1965 that the transistor count on a microchip would double every year. Another Intel executive, David House, later modified this statement, saying, “Calculate power doubling every 18 months.”" For a while, Intel’s CPUs were the poster child for this law. That is, until CPU performance increases flatten out like a block of limestone.
However, if you zoomed out, the next limestone block was already there – the increase in computation simply shifted from the CPU to the GPU world. Nvidia CEO Jensen Huang played a long game and emerged a stronger winner, initially foraying into gaming, then computer vision and most recently generative AI.
illusion of spontaneous evolution
Technology development is full of booms and plateaus, and General AI is no exception. The current wave is driven by transformer architecture. To quote Dario Amodei, president and co-founder of Anthropic: “Exponential continues to happen until it doesn’t. And every year we’ve been like, ‘Well, it can’t possibly be the case that things will continue at exponential’ – and then every year it happens.”
But as CPUs stagnate and GPUs take the lead, we are seeing signs that the LLM growth paradigm is changing again. For example, in late 2024, DeepSeek surprised the world by training a world-class model on an extremely small budget, partly using MoE technology.
Do you remember where you saw this technology mentioned recently? Nvidia’s Rubin press release: The technology “…incorporates the latest generation of Nvidia NVLink interconnect technology… to accelerate agentic AI, advanced reasoning, and large-scale MOE model inference at up to 10x lower cost per token.”
Jensen knows that achieving that coveted exponential growth in computation no longer comes from pure brute force. Sometimes you need to completely change the architecture to take the next step.
Latency crisis: where Grok fits in
This long introduction brings us to Grok.
The biggest gains in AI reasoning capabilities in 2025 were driven by “guess time computation” – or, in layman’s terms, “allowing models to think longer term”. But time is money. Consumers and businesses don’t like waiting.
Grok comes into play here with his lightning speed estimate. If you bring together the architectural efficiency of models like DeepSeek and the sheer throughput of Grok, you get marginal intelligence at your fingertips. By executing estimates faster, you can “out-think” competing models, offering customers a “smart” system without the lag penalty.
From universal chip to predictive optimization
For the past decade, the GPU has been the universal hammer for every AI nail. You use H100 to train the model; You use H100s (or a trimmed-down version) to drive the model. But as the model shifts toward "system 2" Thinking – where AI reasons, self-corrects and iterates before providing answers – shifts the computational workload.
Training requires massive parallel brute force. Inference, especially for logic models, requires fast sequential processing. It must generate tokens instantly to facilitate complex chains of thought without the user having to wait for minutes for an answer. Grok’s LPU (Language Processing Unit) architecture overcomes the memory bandwidth bottleneck that plagues GPUs during small-batch inference, allowing lightning-fast inference.
Engine of the next wave of development
For the C-suite, it solves potential convergence "time to think" Latency problem. Consider the expectations from AI agents: We want them to autonomously book flights, code entire apps, and research legal precedent. To do this reliably, a model may need to generate 10,000 internal "idea token" To verify your own work before outputting a single word to the user.
- On a standard GPU: 10,000 thought tokens may take 20 to 40 seconds. The user gets bored and leaves.
-
On Grok: The same chain of thoughts occurs in less than 2 seconds.
If Nvidia integrates Grok’s technology, they solve this "waiting for the robot to think" crisis. They retain the magic of AI. Just as they moved from rendering pixels (gaming) to rendering intelligence (Gen AI), they will now move to rendering logic in real time.
Furthermore, it creates a formidable software moat. Grok’s biggest hurdle has always been the software stack; Nvidia’s biggest asset is CUDA. If Nvidia wraps its ecosystem around Grok’s hardware, they effectively dig a moat so wide that competitors can’t cross it. They will provide universal platform: best environment to train and most efficient environment to run (Grok/LPU).
Consider what happens when you combine that raw inference power with a next-generation open source model (like the rumored DeepSeek 4): You get an offering that will rival today’s leading models in cost, performance, and speed. This opens up opportunities for Nvidia to power its growing number of fast-paced customers, from directly entering the estimation business with its own cloud offerings.
Next step on the pyramid
Returning to our initial metaphor: the "exponential" The evolution of AI is not a smooth line of raw FLOPs; It is a stepping stone to break barriers.
- Block 1: We couldn’t calculate fast enough. Solution: GPU.
-
Block 2: We couldn’t train deep enough. Solution: Transformer Architecture.
-
Block 3: we can’t "Thinking" Quite fast. Solution: Grok’s LPU.
Jensen Huang has never been afraid to abuse its own product line to capture the future. By validating Groke, Nvidia won’t just buy a faster chip; They will bring the wisdom of the next generation to the masses.
Andrew Filev, Founder and CEO of Zencoder
<a href