Can humans make AI any better?

Obsidian Metadata

channel	Welch Labs
url	https://www.youtube.com/watch?v=2hcsmtkSzIw
published	2026-02-01
categories	Youtube

Description

Apply to work at Tufalabs: https://tufalabs.ai/join

Welch Labs Book: https://www.welchlabs.com/resources/ai-book-ezrzm-msrmc Welch Labs eBook: https://www.welchlabs.com/resources/the-welch-labs-illustrated-guide-to-ai-digital-download Patreon: https://www.patreon.com/welchlabs

SECTIONS 0:00 - Harpy 4:39 - The Bitter Lesson 5:58 - Sutton Goes on a Podcast 8:22 - LLMs are Not Bitter Lesson Pilled? 9:19 - Supervised Learning 10:04 - Reinforcement Learning 10:32 - Work for Tufalabs! 11:50 - How AlphaGo Surpassed Humans 17:49 - RLHF and RLVR 18:41 - The Era of Experience 20:27 - My Take 21:05 - Welch Labs Book!

TECHNICAL NOTES https://www.welchlabs.com/blog/2026/1/31/the-bitter-lesson-video-technical-notes

CODE https://github.com/stephencwelch/manim_videos

REFERENCES The Bitter Lesson: http://www.incompleteideas.net/IncIdeas/BitterLesson.html Dwarkesh Patel’s interview with Richard Sutton: https://www.youtube.com/watch?v=21EYKqUsPfg AlphaGo vs Lee Sedol Match 4: https://www.youtube.com/watch?v=yCALyQRN3hw Repurposed some board setups and heatmaps from: https://www.lesswrong.com/posts/FF8i6SLfKb4g7C4EL/inside-the-mind-of-a-superhuman-go-model-how-does-leela-zero-2?utm_source=chatgpt.com Great HARPY video: https://www.youtube.com/watch?v=NiiDe2n-GeQ Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998. Averbuch, Amir, et al. “An IBM PC based large-vocabulary isolated-utterance speech recognizer.” ICASSP’86. IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 11. IEEE, 1986. Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9. Silver, David, et al. “Mastering the game of Go with deep neural networks and tree search.” nature 529.7587 (2016): 484-489. Silver, David, et al. “Mastering the game of go without human knowledge.” nature 550.7676 (2017): 354-359. Lowerre, Bruce. “The HARPY speech understanding system.” Readings in speech recognition. 1990. 576-586. Silver, David, and Richard S. Sutton. “Welcome to the era of experience.” Google AI 1 (2025).

In 2019, computer scientist Richard Sutton published “The Bitter Lesson,” which has since become one of the most influential essays in AI. While the industry often uses it to justify the massive scaling of Large Language Models (LLMs), this video reveals that Sutton himself views our current path as a potential repetition of past mistakes rather than a fulfillment of the lesson.

The Core of the Bitter Lesson

The “bitter” realization is that 70 years of AI research show that general methods leveraging computation (scaling) always eventually outperform methods that rely on human knowledge (hand-crafted rules or specialized architectures) 05:01.

Human Knowledge is a Bottleneck: Building our discoveries into AI makes it harder for the system to discover its own, potentially superior, methods 09:10.
The Scaling Edge: Success in AI comes from search and learning—two methods that improve exponentially as compute increases.

The Historical Precedent: The Harpy System

In the 1970s, CMU’s Harpy system achieved breakthrough speech recognition using a massive, hand-coded Knowledge Graph 00:16.

The Trap: Experts painstakingly designed grammars and linguistic “juncture rules” to help the AI 02:08.
The Shift: By the 1990s, these expert systems were crushed by Hidden Markov Models, which used statistics and data rather than linguistic expertise 04:02.

The Current Dilemma: Are LLMs a “Bitter” Success?

There is a profound nuance in how Sutton views LLMs compared to how the industry views them.

The Industry View

LLMs are the ultimate proof of the Bitter Lesson because they use massive compute and simple objectives (next-token prediction) to achieve intelligence.

Sutton’s “Gotcha” 06:55

Sutton argues that LLMs might actually be a negative example of the lesson:

Reliance on Human Imitation: LLMs are trained via supervised learning to imitate human-generated text 09:25.
Static Intelligence: They are limited to the knowledge we have already discovered. If you trained an LLM 500 years ago, it would be “stuck” in the physics of that era without the ability to discover anything new 19:00.
The Prediction: Sutton expects that systems learning from direct experience and real-world rewards will eventually supersede LLMs, just as data-driven models superseded Harpy 08:13.

Implications for AI Development

The video suggests that the next frontier of AI will move away from pure imitation toward Reinforcement Learning (RL) and “Experience.”

Discovery vs. Imitation: Supervised learning (LLMs) teaches the model what to say, while Reinforcement Learning (AlphaGo) allows an agent to discover how to win through interaction 13:22.
AlphaGo as the Blueprint: * AlphaGo (Original): Used supervised learning on human games as a starting point 12:56.
- AlphaGo Zero: Used zero human data, relying only on RL, and became far more powerful, developing an “alien” playing style humans never imagined 16:47.
The Era of Experience: Sutton and David Silver argue that for AI to move past current human misconceptions (like moving from Newtonian to Quantum physics), they must interact with the physical world and learn from “verifiable rewards” 18:51.

Nuances and Limitations

The “Scaffolding” Theory: Current reasoning models (like those using RLVR—Reinforcement Learning with Verifiable Rewards) use human text as a starting point but allow the AI to find its own paths to solve math or code problems 18:12.
The Real-World Gap: A major “gotcha” is that RL works best in closed systems with clear rules (games, math, code). Applying this to the “messy” real world remains a significant challenge 20:46.

Key Takeaways

Richard Sutton’s “Bitter Lesson” posits that general methods exploiting computation consistently outperform human-engineered specific knowledge in AI development.
Early AI systems like Harpy were heavily reliant on explicit human knowledge and rules, limiting their scalability and generality.
Modern AI, including AlphaGo and LLMs, represents a paradigm shift towards data-driven learning, with reinforcement learning being a key driver.
AlphaGo’s success against human Go masters showcased the power of self-play and reinforcement learning, demonstrating AI’s ability to learn complex strategies without direct human instruction.
Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Virtual Reward (RLVR) are current techniques that leverage human input to guide AI learning within a scalable framework.
The “Era of Experience” emphasizes that future AI progress will come from systems that learn continuously from interaction and vast datasets, with humans focused on designing the learning mechanisms.
Humans contribute to making AI better by creating more effective, scalable learning architectures, providing the environment, goals, and feedback, allowing the AI to autonomously acquire complex capabilities through experience.

Mindmap

graph TD
    A[Can humans make AI any better?] --> B(The Bitter Lesson)
    B --> B1(Richard Sutton's Core Idea)
    B1 --> B1a(General methods + Computation)
    B1a --> B1b(Outperforms human-specific knowledge)
    A --> C(Evolution of AI Approaches)
    C --> C1(Early AI: Harpy - Rule-based, human-coded)
    C --> C2(Modern AI: Data-driven Learning)
    C2 --> C2a(Supervised Learning)
    C2a --> C2b(Reinforcement Learning)
    C2b --> C2b1(AlphaGo: Self-play, experience-driven)
    C2b1 --> C2b1a(Surpassed human masters)
    C2b --> C2b2(RLHF & RLVR: Learning from feedback)
    C2 --> C2c(LLMs: Unsupervised, emergent abilities)
    A --> D(The Era of Experience)
    D --> D1(AI learns from vast interaction & data)
    D1 --> D1a(Stop programming intelligence, start teaching it)
    A --> E(Human's Role in Improving AI)
    E --> E1(Design better learning architectures)
    E --> E2(Define goals & environments)
    E --> E3(Provide feedback for alignment)
    E --> E4(AI autonomously acquires complexity)

Notable Quotes

Transcript (YouTube)

thought umwelt

Explorer

Can humans make AI any better?

The Core of the Bitter Lesson

The Historical Precedent: The Harpy System

The Current Dilemma: Are LLMs a “Bitter” Success?

The Industry View

Sutton’s “Gotcha” 06:55

Implications for AI Development

Nuances and Limitations

Key Takeaways

Mindmap

Notable Quotes

Graph View

Table of Contents