Ilya Sutskever, Yann LeCun And The End Of “Just Add GPUs”

When two of the most influential people in AI say so Today’s big language models are reaching their limitsIt is worth noting.

In a recent long-term interview, Ilya Sutskever – co-founder of OpenAI and now head of SAFE Superintelligence Inc. – argued that the industry is moving from a “The Age of Scalping” to one “Age of Research”At the same time, yan lekanThe VP and Chief AI Scientist at Meta is loudly pushing this LLM is not the future of AI at all And we need a completely different way of “world model” and architecture like jepa,

As developers and founders, we are building products right in the middle of that change.

This article breaks down Sutskever and LeCun’s approaches and what they actually mean for people shipping software.

1. Sutscaler’s Timeline: From Research → Scaling → Research Again

Sutskever divides the last decade of AI into three phases:

1.1. 2012–2020: The first era of research

This is the era of “trying everything”:

sensory network for vision
Sequence model and attention
Early Reinforcement Learning Successes
Lots of little experiments, new architectures and strange ideas

There Were Large models, but calculations and data were still limited. progressed from new conceptsNot big groups.

1.2. 2020–2025: The age of scaling

Then scaling laws changed everything.

The recipe became:

More data + more calculations + bigger models = better results.

You don’t need to be extremely creative to justify a billion-dollar GPU bill. You can point to a curve: as you increase parameters and tokens, performance increases smoothly.

This gave us:

1.3. 2025 onwards: back to the age of research (but with giant computers)

Now Sutskever is saying this Scaling alone is no longer enough,

Industry is already working crazy scale,
The Internet is limited, so you can’t scrape high-quality, diverse text forever.
The returns from “just make it 10× bigger” are becoming lower and more unpredictable.

We are moving into a phase where:

clusters remain huge, but Progress depends on new ideasNot just new GPUs.

2. Why is the current LLM method exceeding its limitations?

Suitskever revolves around three main issues.

2.1. Benchmark vs. real-world usability

The models look very beautiful on paper:

But everyday users still participate in:

so there’s a gap in between benchmark performance And actual reliability When one uses the model as a teammate or co-pilot.

2.2. Pre-training is powerful, but opaque

The big idea of this era was: pre-train on huge text + images and you’ll learn “everything”.

It worked incredibly well… but it also has disadvantages:

You are not completely in control What model learns
When it fails, it’s hard to tell whether the problem is data, architecture, or something deeper.
Stepping up performance often means more of the sameno better understanding

That’s why there’s so much focus on it now after training Tricks: RLHF, reward models, system prompts, fine-tuning, use of tools, etc. We’re paying attention to the limitations of the pre-training recipe.

2.3. Real bottleneck: normalization

The biggest unsolved problem for Sutskever generalization,

Humans can:

Learn a new concept from some examples
Transfer knowledge between domains
Keep learning without forgetting everything

By comparison, models still require:

best practices today generalize much worse than peopleFixing this isn’t a matter of the other 10,000 GPUs; This requires new principles and new training methods,

3. SAFE Superintelligence Inc.: Betting on New Recipes

Sutskever’s new company, SAFE Superintelligence Inc. (SSI)is built around a simple thesis:

SSI is not rushing into consumer products. Instead, it puts itself this way:

focus on Long Term Research in Superintelligence
trying to invent New training methods and architecture
put Security and controllability in the original from day one

Instead of betting that “GPT-7 but bigger” will magically become AGI, SSI is betting on that a different kind of modelWill need to be trained with different objectives.

4. Have tech companies spent too much on GPUs?

Listening to Sutskever, it’s hard not to read between the lines:

Huge amounts of money have been spent in GPU clusters on the assumption that scale alone will continue to provide step-function advantages.
We are finding out that Marginal benefits from scaling are becoming smaller and progress is less predictable.

This doesn’t mean that the GPU arms race was futile. Without it, we would not have today’s LLM at all.

But this means:

The next big improvements will probably come from here smarter algorithmsnot only more expensive hardware,
Access to H100s is slowly increasing GoodsWhile real innovation goes back ideas and data,

For founders planning multi-year product strategies, this is a big change.

5. Yann LeCun’s counterargument: LLMs are not the future at all

If Sutskever is saying “scaling is necessary but insufficient,” yan lekan Proceeds:

LLMs, as we know them, are not a path to real intelligence.

He has been very clear about this in conversations, interviews and posts.

5.1. What doesn’t LeCun like about LLM?

LeCun’s main criticisms can be summarized in three points:

limited understanding
LLMs are very good at manipulating text but there is one shallow understanding of the physical world,
They don’t really “understand” objects, physics, or causality – all the things you need for real-world reasoning and planning.
A product-driven impasse
He sees LLMs as an amazing product technology (chatbots, assistants, coding helpers) but believes they are approaching its natural limits,
Each new model is larger and more expensive, yet offers small improvements.
Simplicity of token prediction
Under the hood, an LLM is just predicting the next token. LeCun argues that it is a A very narrow, simplistic proxy for intelligence.,
For them, real logic cannot emerge simply from predicting the next word.

5.2. World Model and Jappa

Instead of LLM, LeCun puts forward the idea of world model – System which:

Learn by watching the world (especially videos)
Create internal representations of objects, space and time
can do predict what will happen next In that world, it’s not just which word comes next

One of the architectures he’s working on JEPA – Joint Embedding Predictive Architecture,

It learns representations by predicting future embeddings rather than raw pixels or text
It is designed to scale to complex, high-dimensional inputs such as video
The goal is to have a model that can support Continuous memory, reasoning and planning

5.3. Four pillars of future AI

LeCun often describes four pillars essential to any truly intelligent system:

understanding the physical world
persistent memory
logic
Plan

He argues that most of today’s LLM-centric systems are hack around Instead of solving these needs directly. This is why he is focusing more on world-model architectures rather than large textual models.

6. Sutscurve vs. Lekun: Same Diagnosis, Different Treatment

Interestingly, Sutskever and Lacan agree on the problem,

Where do they differ? How radical must the change be?,

Sutskever It seems likely that the next breakthroughs will still come from the same general family of models – large neural nets trained on massive datasets – but with better objectives, better generalization, and a much stronger protection function.
But believe we need a new paradigm: World models that learn from interactions with the environment, close to the learning of animals and humans.

For people building on today’s models, this tension is actually good news: It means there are still a lot of limitations left.

7. What does this mean for developers and founders

So what should you do if you’re running an AI lab, not an Are Building products on top of OpenAI, Anthropic, Google, Meta, etc?

7.1. Hardware is becoming less of a moat

If the next big profit won’t come from scaling alone:

The advantage of “we have more GPUs than you” diminishes over time
This is where your real edge comes from Use Cases, Data, UX and Integrationsraw model no size

This is great for startups and agencies: you can rely on larger models and still make a difference.

7.2. Benchmarks are not your product

Both Sutskever and LeCun’s criticisms are a warning against obsessing over the leaderboard.

ask yourself:

Does this improvement meaningfully change what my users can do?
Does it reduce hallucinations? their Workflow?
Does it make the system more reliable, debuggable, and explainable?

User-centric metrics matter more than other +2% on some synthetic reasoning benchmarks.

7.3. Expect more variety in model types

If LeCun’s world model, JPA-style architecture, or other alternatives start working, we will likely see:

special model for Physical Reasoning and Robotics
LLM working as a language interface On deep systems that actually handle planning and environmental modeling
More hybrid stacks, where multiple models collaborate

For developers, this means learning Manage multiple systems Instead of calling just one chat endpoint.

7.4. Data, workflow and feedback loops are where you win

No matter who is right about the distant future, one thing is clear for product manufacturers:

Owner high quality domain data
designing tight feedback loop between users and models
building Evaluations that match your use case

…will matter more than anything else.

You don’t need to solve world modeling or superintelligence on your own. To you:

Choose the right model for the job
Wrap them in workflows that are useful to your users
Keep improving based on real-world behavior

8. A cool twist

In 2019-2021, the story of AI was simple: “Scale is all you need.” Bigger models, more data, more GPUs.

Now, two of the most influential figures in the field are effectively saying:

We are entering a new phase where research, theory and new architectures matter as much as infrastructure.

For builders, this doesn’t mean you should stop using LLM or put your AI roadmap on hold. This means:

Pay less attention to chasing the next parameter count
pay more attention to How Intelligence is visible inside your product: reliability, logic, planning, and how it fits into real human workflows.

The GPU race gave us the tools we have today. The next decade will be defined by who we are to do With them – and new ideas that ultimately take us beyond “predict the next token.”

<a href

Ilya Sutskever, Yann LeCun and the End of “Just Add GPUs” — abZ Global

1. Sutscaler’s Timeline: From Research → Scaling → Research Again

1.1. 2012–2020: The first era of research

1.2. 2020–2025: The age of scaling

1.3. 2025 onwards: back to the age of research (but with giant computers)

2. Why is the current LLM method exceeding its limitations?

2.1. Benchmark vs. real-world usability

2.2. Pre-training is powerful, but opaque

2.3. Real bottleneck: normalization

3. SAFE Superintelligence Inc.: Betting on New Recipes

4. Have tech companies spent too much on GPUs?

5. Yann LeCun’s counterargument: LLMs are not the future at all

5.1. What doesn’t LeCun like about LLM?

5.2. World Model and Jappa

5.3. Four pillars of future AI

6. Sutscurve vs. Lekun: Same Diagnosis, Different Treatment

7. What does this mean for developers and founders

7.1. Hardware is becoming less of a moat

7.2. Benchmarks are not your product

7.3. Expect more variety in model types

7.4. Data, workflow and feedback loops are where you win

8. A cool twist

Like this:

Related

Leave a Comment Cancel reply

1. Sutscaler’s Timeline: From Research → Scaling → Research Again

1.1. 2012–2020: The first era of research

1.2. 2020–2025: The age of scaling

1.3. 2025 onwards: back to the age of research (but with giant computers)

2. Why is the current LLM method exceeding its limitations?

2.1. Benchmark vs. real-world usability

2.2. Pre-training is powerful, but opaque

2.3. Real bottleneck: normalization

3. SAFE Superintelligence Inc.: Betting on New Recipes

4. Have tech companies spent too much on GPUs?

5. Yann LeCun’s counterargument: LLMs are not the future at all

5.1. What doesn’t LeCun like about LLM?

5.2. World Model and Jappa

5.3. Four pillars of future AI

6. Sutscurve vs. Lekun: Same Diagnosis, Different Treatment

7. What does this mean for developers and founders

7.1. Hardware is becoming less of a moat

7.2. Benchmarks are not your product

7.3. Expect more variety in model types

7.4. Data, workflow and feedback loops are where you win

8. A cool twist

Share this:

Like this:

Related

Leave a Comment Cancel reply