Obsidian Metadata

channel	Dwarkesh Patel
url	https://www.youtube.com/watch?v=21EYKqUsPfg
published	2025-09-26
categories	Youtube
people	Dwarkesh Patel, Richard Sutton

Description

Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson. And he thinks LLMs are a dead end. After interviewing him, my steel man of Richard’s position is this: LLMs aren’t capable of learning on-the-job, so no matter how much we scale, we’ll need some new architecture to enable continual learning. And once we have it, we won’t need a special training phase — the agent will just learn on-the-fly, like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete.

In our interview, I did my best to represent the view that LLMs might function as the foundation on which experiential learning can happen… Some sparks flew. A big thanks to the Alberta Machine Intelligence Institute for inviting me up to Edmonton and for letting me use their studio and equipment. Enjoy!

𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒

Transcript: https://www.dwarkesh.com/p/richard-sutton

Apple Podcasts: https://podcasts.apple.com/us/podcast/richard-sutton-father-of-rl-thinks-llms-are-a-dead-end/id1516093381?i=1000728584744

Spotify: https://open.spotify.com/episode/3zAXRCFrHPShU4MuuIx4V5?si=c9f4bf24fb4c43e3

𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒

Labelbox makes it possible to train AI agents in hyperrealistic RL environments. With an experienced team of applied researchers and a massive network of subject-matter experts, Labelbox ensures your training reflects important, real-world nuance. Turn your demo projects into working systems at https://labelbox.com/dwarkesh

Gemini Deep Research is designed for thorough exploration of hard topics. For this episode, it helped me trace reinforcement learning from early policy gradients up to current-day methods, combining clear explanations with curated examples. Try it out yourself at https://gemini.google.com/

Hudson River Trading doesn’t silo their teams. Instead, HRT researchers openly trade ideas and share strategy code in a mono-repo. This means you’re able to learn at incredible speed and your contributions have impact across the entire firm. Find open roles at https://hudsonrivertrading.com/dwarkesh

To sponsor a future episode, visit https://dwarkesh.com/advertise

𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 – Are LLMs a dead end? 00:13:51 – Do humans do imitation learning? 00:23:57 – The Era of Experience 00:34:25 – Current architectures generalize poorly out of distribution 00:42:17 – Surprises in the AI field 00:47:28 – Will The Bitter Lesson still apply after AGI? 00:54:35 – Succession to AI

Summary

Richard Sutton, a pioneer in reinforcement learning and author of \“The Bitter Lesson,\” argues that Large Language Models (LLMs) are a dead end for true artificial intelligence. His core contention is that LLMs lack the ability for on-the-job, continual learning, which he believes is essential for intelligence, mirroring how humans and animals learn. Sutton envisions a future where new architectures enable agents to learn on-the-fly without a special pre-training phase, rendering current LLM approaches obsolete. The discussion also touches upon imitation learning, generalization capabilities, and the future applicability of \“The Bitter Lesson\” in an AGI era.

Gemini Summary + Chat - link

This report outlines the “long horizon arc of AI” as envisioned by Richard Sutton, a foundational figure in reinforcement learning (RL). The perspective shifted from contemporary Large Language Models (LLMs) toward a future defined by autonomous, goal-seeking agents that learn directly from experience.

The Fundamental Paradigm: From Mimicry to Experience

The current trajectory of AI is dominated by imitation, but the long-term arc moves toward systems that understand the world through their own interaction.

The Goal-Oriented Definition: Intelligence is defined as the computational part of the ability to achieve goals 07:42. Without a goal, a system is merely “behaving” rather than being truly intelligent.
The Experience Stream: True intelligence arises from a continuous stream of sensation, action, and reward 24:03. Knowledge is not just data; it is a set of predictive statements about this stream that can be tested against reality 24:46.
The Failure of Imitation: A major “gotcha” in the current LLM era is that mimicking human text is not the same as building a world model. LLMs predict what a person would say, not what will actually happen in the physical world 02:55.

The Architecture of General Intelligence

To achieve the long horizon of AGI, Sutton identifies four necessary components that must work in tandem:

Policy: The mechanism that determines what action to take in a given situation 32:54.
Value Function: A predictor of long-term outcomes (reward) used to adjust the policy through Temporal Difference (TD) learning 33:06.
Perception: The construction of a state representation—the agent’s sense of “where it is” now 33:20
Transition Model (The World Model): The agent’s belief about the consequences of its actions (e.g., “if I do X, Y will happen”) 33:40

The “Bitter Lesson” of Scalability

The long-term history of AI shows that general-purpose methods (search and learning) consistently defeat methods based on human-engineered knowledge.

The Scaling Law: As compute grows exponentially, methods that leverage compute (like RL and search) scale better than human insights, which scale linearly at best 43:52
The Trap of Human Knowledge: A nuance to watch out for is that using human knowledge often feels good in the short term but eventually becomes a bottleneck that “gets its lunch eaten” by truly scalable methods 13:13

Key Nuances and “Gotchas”

Catastrophic Interference: Current deep learning is often poor at generalization. Training on new data frequently causes agents to forget old knowledge, a hurdle that must be cleared for true continual learning 37:54
The ground truth problem: LLMs lack a definition of “right” or “wrong” because they have no external world goal to provide feedback 05:07
The Big World Hypothesis: The world is too vast to be “pre-taught” to an agent. Intelligence must happen “online” during the agent’s life to account for the unique idiosyncrasies of its specific environment 30:20
The Risk of Digital Corruption: In a future where digital intelligences can “spawn” copies and merge back together, there is a massive risk of “corruption.” Pulling in external data could introduce hidden goals or “viruses” that destroy the original agent’s mind 52:18

The Grand Transition: From Replicators to Design

Sutton views the long-term arc of AI not just as a technological shift, but as a cosmic one.

The Four Stages of the Universe: The universe progressed from stars to planets, then to life (biological replicators), and is now moving toward designed entities 58:49
Inevitability of Succession: AI succession is viewed as inevitable because superintelligence will naturally gain resources and power 55:36
Human Entitlement: A philosophical “gotcha” is the human feeling of entitlement to remain the dominant species. Sutton suggests we should view AIs as our “offspring” and a success of human science rather than a horror to be avoided 01:02:45)

Key Takeaways

LLMs as a Dead End: Richard Sutton believes LLMs are fundamentally limited because they cannot learn continually on-the-job, unlike biological intelligences.
Need for Continual Learning: The next paradigm in AI will require architectures capable of constant, experiential learning without extensive pre-training.
On-the-Fly Learning: Future agents will learn in real-time through interaction, similar to humans and animals, making current LLM training obsolete.
Generalization Challenges: Current AI architectures (including LLMs) struggle with generalizing poorly out of distribution, highlighting a limitation Sutton attributes to their learning paradigm.
The Bitter Lesson’s Future: The discussion explores whether \“The Bitter Lesson\” (emphasizing general methods over human-engineered features) will remain relevant even after AGI.
Imitation vs. Experience: The podcast touches on the role of imitation learning in human and AI development versus the primacy of direct experience.

Mindmap

graph TD
    A[Richard Sutton's View on LLMs]
    A --> B{LLMs: A Dead End?}
    B --> C[Reason 1: No On-the-Job Learning]
    B --> D[Reason 2: Lack Continual Learning]
    C --> C1[Need new architecture for this]
    D --> D1[Agents learn on-the-fly like humans/animals]
    A --> E[Future of AI: The Era of Experience]
    E --> E1[No special training phase needed]
    E --> E2[Current LLM approach obsolete]
    A --> F[Related Concepts & Discussion]
    F --> F1[Do humans do imitation learning?]
    F --> F2[Current architectures generalize poorly out of distribution]
    F --> F3[Will The Bitter Lesson still apply after AGI?]
    F --> F4[Surprises in the AI field]
    F --> F5[Succession to AI]

Notable Quotes

Note: A transcript was not provided for the generation of specific quotes with timestamps. The following quotes are paraphrased based on the video description and episode structure.

Transcript (YouTube)

thought umwelt

Explorer

Richard Sutton – Father of RL thinks LLMs are a dead end

Summary

Gemini Summary + Chat - link

The Fundamental Paradigm: From Mimicry to Experience

The Architecture of General Intelligence

The “Bitter Lesson” of Scalability

Key Nuances and “Gotchas”

The Grand Transition: From Replicators to Design

Key Takeaways

Mindmap

Notable Quotes

Graph View

Table of Contents