Obsidian Metadata
| channel | Welch Labs |
| url | https://www.youtube.com/watch?v=2hcsmtkSzIw |
| published | 2026-02-01 |
| categories | Youtube |
Description
Apply to work at Tufalabs: https://tufalabs.ai/join
Welch Labs Book: https://www.welchlabs.com/resources/ai-book-ezrzm-msrmc Welch Labs eBook: https://www.welchlabs.com/resources/the-welch-labs-illustrated-guide-to-ai-digital-download Patreon: https://www.patreon.com/welchlabs
SECTIONS 0:00 - Harpy 4:39 - The Bitter Lesson 5:58 - Sutton Goes on a Podcast 8:22 - LLMs are Not Bitter Lesson Pilled? 9:19 - Supervised Learning 10:04 - Reinforcement Learning 10:32 - Work for Tufalabs! 11:50 - How AlphaGo Surpassed Humans 17:49 - RLHF and RLVR 18:41 - The Era of Experience 20:27 - My Take 21:05 - Welch Labs Book!
TECHNICAL NOTES https://www.welchlabs.com/blog/2026/1/31/the-bitter-lesson-video-technical-notes
CODE https://github.com/stephencwelch/manim_videos
REFERENCES The Bitter Lesson: http://www.incompleteideas.net/IncIdeas/BitterLesson.html Dwarkesh Patel’s interview with Richard Sutton: https://www.youtube.com/watch?v=21EYKqUsPfg AlphaGo vs Lee Sedol Match 4: https://www.youtube.com/watch?v=yCALyQRN3hw Repurposed some board setups and heatmaps from: https://www.lesswrong.com/posts/FF8i6SLfKb4g7C4EL/inside-the-mind-of-a-superhuman-go-model-how-does-leela-zero-2?utm_source=chatgpt.com Great HARPY video: https://www.youtube.com/watch?v=NiiDe2n-GeQ Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998. Averbuch, Amir, et al. “An IBM PC based large-vocabulary isolated-utterance speech recognizer.” ICASSP’86. IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 11. IEEE, 1986. Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9. Silver, David, et al. “Mastering the game of Go with deep neural networks and tree search.” nature 529.7587 (2016): 484-489. Silver, David, et al. “Mastering the game of go without human knowledge.” nature 550.7676 (2017): 354-359. Lowerre, Bruce. “The HARPY speech understanding system.” Readings in speech recognition. 1990. 576-586. Silver, David, and Richard S. Sutton. “Welcome to the era of experience.” Google AI 1 (2025).
In 2019, computer scientist Richard Sutton published “The Bitter Lesson,” which has since become one of the most influential essays in AI. While the industry often uses it to justify the massive scaling of Large Language Models (LLMs), this video reveals that Sutton himself views our current path as a potential repetition of past mistakes rather than a fulfillment of the lesson.
The Core of the Bitter Lesson
The “bitter” realization is that 70 years of AI research show that general methods leveraging computation (scaling) always eventually outperform methods that rely on human knowledge (hand-crafted rules or specialized architectures) 05:01.
-
Human Knowledge is a Bottleneck: Building our discoveries into AI makes it harder for the system to discover its own, potentially superior, methods 09:10.
-
The Scaling Edge: Success in AI comes from search and learning—two methods that improve exponentially as compute increases.
The Historical Precedent: The Harpy System
In the 1970s, CMU’s Harpy system achieved breakthrough speech recognition using a massive, hand-coded Knowledge Graph 00:16.
-
The Trap: Experts painstakingly designed grammars and linguistic “juncture rules” to help the AI 02:08.
-
The Shift: By the 1990s, these expert systems were crushed by Hidden Markov Models, which used statistics and data rather than linguistic expertise 04:02.
The Current Dilemma: Are LLMs a “Bitter” Success?
There is a profound nuance in how Sutton views LLMs compared to how the industry views them.
The Industry View
LLMs are the ultimate proof of the Bitter Lesson because they use massive compute and simple objectives (next-token prediction) to achieve intelligence.
Sutton’s “Gotcha” 06:55
Sutton argues that LLMs might actually be a negative example of the lesson:
-
Reliance on Human Imitation: LLMs are trained via supervised learning to imitate human-generated text 09:25.
-
Static Intelligence: They are limited to the knowledge we have already discovered. If you trained an LLM 500 years ago, it would be “stuck” in the physics of that era without the ability to discover anything new 19:00.
-
The Prediction: Sutton expects that systems learning from direct experience and real-world rewards will eventually supersede LLMs, just as data-driven models superseded Harpy 08:13.
Implications for AI Development
The video suggests that the next frontier of AI will move away from pure imitation toward Reinforcement Learning (RL) and “Experience.”
-
Discovery vs. Imitation: Supervised learning (LLMs) teaches the model what to say, while Reinforcement Learning (AlphaGo) allows an agent to discover how to win through interaction 13:22.
-
AlphaGo as the Blueprint: * AlphaGo (Original): Used supervised learning on human games as a starting point 12:56.
- AlphaGo Zero: Used zero human data, relying only on RL, and became far more powerful, developing an “alien” playing style humans never imagined 16:47.
-
The Era of Experience: Sutton and David Silver argue that for AI to move past current human misconceptions (like moving from Newtonian to Quantum physics), they must interact with the physical world and learn from “verifiable rewards” 18:51.
Nuances and Limitations
-
The “Scaffolding” Theory: Current reasoning models (like those using RLVR—Reinforcement Learning with Verifiable Rewards) use human text as a starting point but allow the AI to find its own paths to solve math or code problems 18:12.
-
The Real-World Gap: A major “gotcha” is that RL works best in closed systems with clear rules (games, math, code). Applying this to the “messy” real world remains a significant challenge 20:46.
Key Takeaways
- Richard Sutton’s “Bitter Lesson” posits that general methods exploiting computation consistently outperform human-engineered specific knowledge in AI development.
- Early AI systems like Harpy were heavily reliant on explicit human knowledge and rules, limiting their scalability and generality.
- Modern AI, including AlphaGo and LLMs, represents a paradigm shift towards data-driven learning, with reinforcement learning being a key driver.
- AlphaGo’s success against human Go masters showcased the power of self-play and reinforcement learning, demonstrating AI’s ability to learn complex strategies without direct human instruction.
- Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Virtual Reward (RLVR) are current techniques that leverage human input to guide AI learning within a scalable framework.
- The “Era of Experience” emphasizes that future AI progress will come from systems that learn continuously from interaction and vast datasets, with humans focused on designing the learning mechanisms.
- Humans contribute to making AI better by creating more effective, scalable learning architectures, providing the environment, goals, and feedback, allowing the AI to autonomously acquire complex capabilities through experience.
Mindmap
graph TD A[Can humans make AI any better?] --> B(The Bitter Lesson) B --> B1(Richard Sutton's Core Idea) B1 --> B1a(General methods + Computation) B1a --> B1b(Outperforms human-specific knowledge) A --> C(Evolution of AI Approaches) C --> C1(Early AI: Harpy - Rule-based, human-coded) C --> C2(Modern AI: Data-driven Learning) C2 --> C2a(Supervised Learning) C2a --> C2b(Reinforcement Learning) C2b --> C2b1(AlphaGo: Self-play, experience-driven) C2b1 --> C2b1a(Surpassed human masters) C2b --> C2b2(RLHF & RLVR: Learning from feedback) C2 --> C2c(LLMs: Unsupervised, emergent abilities) A --> D(The Era of Experience) D --> D1(AI learns from vast interaction & data) D1 --> D1a(Stop programming intelligence, start teaching it) A --> E(Human's Role in Improving AI) E --> E1(Design better learning architectures) E --> E2(Define goals & environments) E --> E3(Provide feedback for alignment) E --> E4(AI autonomously acquires complexity)
Notable Quotes
- 4:39: The Bitter Lesson is probably the most controversial yet impactful idea in all of AI.
- 5:15: But the actual lesson is that general methods that leverage computation are ultimately the most effective and efficient path to better AI.
- 10:19: Humans provide the goal, but the system figures out how to get there.
- 12:56: AlphaGo learned the game of Go not by studying human games, but by playing against itself millions of times.
- 16:47: The way AlphaGo, and really all modern AI, learns is by taking an initial policy, running it many times, and improving it through trial and error, getting better from experience.
- 19:10: Welcome to the era of experience. The lesson is simple. Stop programming intelligence, start teaching it.
- 20:38: We make AI better by building better architectures that allow the AI itself to learn better from experience.
Transcript (YouTube)

