Scott Young - Ultralearning, The MIT Challenge

Dwarkesh Podcast 1h38 10 min #9
Scott Young - Ultralearning, The MIT Challenge
Watch on YouTube

Summary

  • Scott Young is the author of Ultraling and famous for the MIT Challenge, where he completed MIT’s four-year computer science curriculum in one year by using freely available online materials. The conversation covers the nature of genius and “miracle years,” the sociology of innovation, the mechanics of learning and transfer, focus, ambition, and advice for young people.

Miracle Years and the Nature of Genius

  • Einstein’s annus mirabilis (1905) produced breakthroughs in Brownian motion, the photoelectric effect, and special relativity. These were not random: they were all problems suited to Einstein’s particular cognitive style—spatial, visual, and intuitive thinking through thought experiments, rather than purely mathematical derivation.
  • Einstein was an outlier even among geniuses: he made multiple huge breakthroughs, struggled for years with the math of general relativity, and still couldn’t get an academic job for years after special relativity. His path was atypical, not a model to emulate.
  • Newton’s miracle year similarly came from working on the right problems at the right time, not from some general method. Most people never have a year that productive.

The Narrow Path of Success

  • Success in high-ambition fields follows surprisingly narrow, structured paths that most people ignore because they romanticize exceptions like Einstein (the patent clerk who revolutionized physics).
  • Data from academia (e.g., Jason Brennan’s Good Work If You Can Get It) shows the filters are rigid: most people who enter academia never get academic jobs, and those who succeed follow the expected path.
  • In non-fiction publishing, the established path is: get an agent first, then write a proposal (part book plan, part business case), pitch for a deal, then write the book. Most amateurs write the book first and then look for a publisher—a strategy that “screams amateur” and rarely works.
  • The lesson: research how success actually works in your field before embarking on a project. Even if you choose a different path, you should know what the status quo is and what you’re opting out of.

Einstein vs. Ultralearning Principles

  • Einstein did enormous amounts of deep thinking on hard problems even when it wasn’t required of him (as a patent clerk). He wasn’t undisciplined—his approach was just more spontaneous and intuitive than a structured ultralearning project.
  • Writing a self-improvement book like Ultralearning means taking spontaneous, intuitive approaches that worked for successful people and turning them into a structured method that someone who wouldn’t do it naturally can follow.
  • Einstein actually followed many of the principles in Ultralearning (deep work, retrieval, focus on understanding) even if he didn’t do so in a structured way—they came naturally to him.

Age and Learning

  • The core principles of ultralearning (retrieval over review, importance of feedback) don’t change with age. A 79-year-old benefits from the same techniques as a 17-year-old.
  • What does change: fluid intelligence and working memory decline from the early 20s. The frontal areas of the brain deteriorate faster, making it harder to control attention, switch between rules, and override habits.
  • Older learners may need to be more deliberate about environment design (removing distractions) and about making connections between pieces of information more explicit (e.g., through carefully designed flashcards that force manipulation of how parts connect).
  • Chunking—assembling pieces of information into meaningful patterns—becomes harder with age, which impacts both learning and intuition. This is also why older people may recognize a face but struggle to recall a name: the binding between different types of information weakens.

Transfer: Why It Fails and When It Works

  • Transfer—the ability to apply knowledge from one domain to another—is much harder than people assume. The brain learns things quite specifically; this specificity is a feature, not a bug, because fine-grained discrimination is what makes us smart.
  • Novices focus on superficial features of problems (e.g., “this involves a pulley”), while experts recognize deeper abstract structures (e.g., “this is a conservation of energy problem”). This recognition requires extensive chunking within a domain.
  • Transfer works when you understand both domains deeply enough to see the same abstract pattern in each. If you haven’t chunked the first domain to that level of abstraction, there’s nothing to transfer.
  • Teaching more theory (as universities do) is the proposed solution to transfer failure, but Scott is skeptical—this is largely what universities already do, and it doesn’t work well.
  • The directness principle: if you have a concrete goal, train the specific sub-skills you need for actual performance. Language learning example: Duolingo emphasizes recognition (multiple choice), but speaking requires recall, pronunciation, and the ability to work around unknown words—skills Duolingo doesn’t train.
  • Scott’s own career (CS → cognitive science) is an example of successful transfer, but it works because he reached a deep, abstract understanding of computation that maps onto cognitive science concepts (data structures, circuit motifs, information encoding). Most people don’t reach that level of understanding in either domain.

Compounding and Diminishing Returns

  • True compounding (unending exponential growth) is vanishingly rare. Most things have regions of exponential growth followed by diminishing returns.
  • Tyler Cowan said he learned more between ages 15–25 than in the last 10 years. This is consistent with diminishing returns: the first exposure to a field gives the biggest gains; later work is on increasingly esoteric problems with less practical utility.
  • There is a compounding confidence curve for ultralearning: if you’ve never done aggressive self-directed learning, the first successful project dramatically changes your self-efficacy and strategy. People like Tristan de Montebello (who learned languages to a high level) report feeling like they could tackle any project after their first success.
  • But there are still S-curves: initial rapid gains give way to incremental improvements. Most people are before the steep part of the curve—they’ve never taken learning seriously—so the potential gains are large.
  • Unending compound growth is impossible in practice (it would imply one person becomes smarter than everyone else on earth). Organizations that try to compound wealth over centuries fail because of corruption, external pressure, and the impossibility of maintaining a mission across generations.

Depth vs. Context in Learning

  • There’s no universal answer to whether you should build a broad map first or dive deep into each piece. The right approach depends entirely on your goal.
  • For language learning, Scott found that going deep into etymology quickly became a waste of time compared to simply memorizing more words. For physics, most people don’t go deep enough—they see the pulleys, not the conservation of energy.
  • If you have multiple goals (pass exams AND do research AND discuss intelligently), more robust strategies exist that serve all of them. If you have a narrow, scripted goal (sound good for 10 minutes of prepared audio), the optimal strategy may not involve real learning at all.
  • For groundbreaking research: specialization is key. You need extremely high fluency in a narrow area to be at the cutting edge. But there’s also a strategic element—doing something nobody else does (like the MIT Challenge) can give you a unique claim to fame even if it’s objectively suboptimal.

The MIT Challenge and the Failed Simulation Effect

  • The MIT Challenge worked as a learning project: Scott completed final exams and programming assignments for MIT’s CS curriculum in one year using free online materials. He chose MIT because their materials were freely available, not because of the brand—though the brand is why people found it impressive.
  • The failed simulation effect (from Cal Newport): we judge impressiveness not by how much work something requires but by how hard it is to imagine doing it ourselves. Being president of 14 clubs is a lot of work but easy to imagine; publishing a book in high school is less work but harder to imagine, so it seems more impressive.
  • MIT’s CS program is theory- and math-heavy, which made it intellectually harder but practically more efficient—programming assignments were tight and proof-based rather than long and tedious. This allowed Scott to add them to the challenge with only 1–2 extra weeks.
  • The conceptual/theoretical focus of MIT’s program was actually more valuable for Scott’s interests (cognitive science, understanding information) than heavy programming assignments would have been.
  • The MIT Challenge also benefited from strategic uniqueness: almost nobody was doing it, so it became a career differentiator. If thousands of people did it, the value would disappear.

Transfer in the MIT Challenge

  • Scott didn’t go into the MIT Challenge expecting transfer effects—he was genuinely interested in learning CS and had been considering going back to school for a second degree.
  • His prediction that many people would follow his model didn’t come true. Very few people have done anything similar at that scale.
  • The transfer value came from deep understanding: learning CS gave him abstract patterns (data structures, algorithms, computational models) that map onto cognitive science. Reading a paper about computational models of chunking is much easier if you’ve taken algorithms classes.
  • His general approach has always been to prioritize understanding over memorization, which tends to produce better transfer. Memorizing solution types might work for a specific exam, but it fails when later courses assume deep understanding (e.g., the Fourier transform).

Focus: Capacity vs. Habits

  • Scott is agnostic about whether focus can be trained as a general cognitive capacity. The muscle metaphor (Cal Newport’s view) is probably wrong: brain training games improve performance on brain training games but don’t transfer to other tasks.
  • What can be improved: the habits, routines, and motivational structures that influence focus. Most people can’t focus not because their brains are incapable but because their phones are buzzing, they find the material boring, or they have more enticing alternatives available.
  • Twitter and similar platforms create variable ratio reinforcement schedules (like slot machines) that make it harder to choose harder, less immediately rewarding activities. If Twitter is a default habitual response, resisting it takes enormous energy.
  • Removing distracting options from your behavioral repertoire makes it easier to sustain effort on hard activities—not by increasing cognitive capacity but by reducing the motivational gradient pulling you away.
  • Scott is skeptical that meditation directly enhances focus capacity (the research is poor and tied to religious traditions), but agrees that reducing enticing distractions can improve your ability to persist on hard tasks.

The Sociology of Innovation: Speedrunning and the Enlightenment

  • Scott wrote an article about why people keep getting better at Tetris (and other domains), inspired by the speedrunning community. He regrets including the “four-minute mile” example (the claim that breaking the barrier inspired others appears to be false based on performance trend data).
  • Speedrunning is a fascinating case study in how innovation happens sociologically. The key innovation: requiring live-stream video proof of record attempts. This allowed everyone to watch, study, and build on each other’s techniques—similar to how patents require disclosure of how something works.
  • Before live streaming, speedrun records were published in magazines with no way to learn the methods. After live streaming, the entire domain exploded in performance because knowledge could spread and be iterated on.
  • This is a model for how innovation works sociologically: network structures, proof requirements, and knowledge-sharing mechanisms create explosions of progress.
  • Most fields don’t require this kind of transparent proof. Why don’t the best programmers at Google live-stream their work? Part of the answer is that speedruns are short (minutes) and easy to watch, while programming projects are hundreds of hours. Additional innovations may be needed to make this broadly applicable.

Progress Studies and Why Innovation May Have Stalled

  • There’s a thesis that innovation has slowed: the first half of the 20th century saw transformative advances across every aspect of life; the second half saw most innovation concentrated in computing and communications.
  • Possible explanations:
    • Low-hanging fruit: Early physics breakthroughs could be made by a patent clerk with back-of-the-envelope calculations; now confirming the Higgs boson requires billions of dollars and the largest machine on earth.
    • Cultural barriers: Increased regulation, risk aversion, and institutional inertia.
    • Startup monoculture: Most “innovation” is applying the internet business model to existing industries (Uber, Airbnb) rather than creating genuinely new things.
  • The Lunar Society (Darwin’s grandfather, James Watt, etc.) and Renaissance Florence are examples of productive cultural congregations. It’s unclear why the internet hasn’t produced more such concentrations of talent—though nascent communities exist (e.g., George Hotz live-streaming protein folding and LeetCode problems).
  • Scott is “bullish” that we don’t understand innovation well enough at a sociological level, and that better theories could lead to experiments in group structures, networks, incentives, and cultures that prime more innovation.

Early Work, Ambition, and Advice for Young People

  • Paul Graham’s essay “Early work” argues that your early projects won’t look impressive, and you need to push through that phase. Scott agrees and has written about this before.
  • The opportunity cost problem: once you’re good at something and it’s rewarding, it’s hard to climb down and do unrewarding exploratory work. But the exploration phase is essential.
  • Kids are objectively terrible at everything, but we praise them for trying. Adults impose a threshold: you need to be at least somewhat good to justify doing something. This prevents exploration.
  • There’s probably a life cycle: 20s and 30s for exploration and training, 40s and early 50s for making your mark. Scott feels himself transitioning from explore to exploit mode.
  • People are too unambitious—not in the sense of wanting to be better than others, but in not thinking of big projects or working on them. Much ambition boils down to social status rather than genuine curiosity.
  • Cultivating a desire to do ambitious, original, interesting things—without knowing where they’ll lead—is rare and valuable for both individuals and society.

Advice for a 20-Year-Old

  • Scott wouldn’t change much about his younger self. He was “ballsy” in ways he might not be now, and worrying about uncertain outcomes may have been productive (it motivated him to make things work).
  • The main tendency he’d warn against: prematurely trying to monetize or prove success. Young people with ability and ambition often suboptimally chase immediate money or status (e.g., a pre-med student taking a waitressing job with good tips instead of studying).
  • If you’re reasonably clever and hardworking, pour yourself into interesting, innovative projects rather than optimizing for immediate returns. You’re increasing the quality of problems you can work on in your 30s and 40s.
  • There’s a difference between paying the bills and chasing immediate money/status opportunities at the expense of long-term ambitions.
  • Reference groups matter: surrounding yourself with ambitious people (as at MIT) makes big dreams feel natural. Small towns and middling universities can create pressure to prove yourself immediately and conform to lower ambitions.

Raising a Genius Baby?

  • The last chapter of Ultralearning discusses the Polgar experiment: Judith Polgar’s father deliberately raised his three daughters to be chess prodigies, and all three became grandmasters. It’s a provocative counterweight to the behavioral genetics view that most traits are fixed.
  • Scott doesn’t plan to engineer his son toward a specific outcome. He wants to give him access to diverse experiences so that the person he becomes is influenced by a wide range of possibilities.
  • He values agency and autonomy centrally: he wants his son to feel like the author of his own life, not someone living his father’s plan. He’ll set an example, provide resources, and show how things are done—but let his son make his own mistakes and choose his own path.
Back to Dwarkesh Podcast