Holden Karnofsky — History's most important century

Dwarkesh Podcast 1h56 8 min #41
Holden Karnofsky — History's most important century
Watch on YouTube

Summary

  • Holden Karnofsky, co-CEO of Open Philanthropy and co-founder of GiveWell, argues that this century could be the most important in human history because we are likely to develop AI systems capable of automating all the key tasks humans perform to advance science and technology. If that happens, we could see explosive, accelerating progress that compresses thousands of years of change into decades, leading to a deeply unfamiliar, potentially post-human future. This makes the present moment our last chance to shape how that future unfolds—for better or worse.

The Core Economic Argument

  • Standard economic growth theory describes a feedback loop: more people generate more ideas, which produce more resources, which support more people, and so on. This loop naturally produces accelerating growth.
  • For most of history, this loop operated as expected. But a few hundred years ago, it broke: when societies got richer, they didn’t produce more people—they had fewer children. Growth continued, but the self-reinforcing acceleration weakened.
  • If AI systems could fully replace humans in generating ideas and advancing technology, the feedback loop would be restored—but this time without depending on population growth. Every computer could become another mind working on technology, leading to unbounded, explosive scientific and technological progress.
  • Simple extrapolation of current economic growth rates suggests the economy would reach an infinite growth rate this century. Holden doesn’t expect that literally, but the underlying dynamic points toward transformative change.

Why This Century Is Already Weird (Even Without AI)

  • The last few hundred years are an extreme outlier in the history of life on Earth. The universe is ~12 billion years old; human civilization is a blink of an eye. Yet virtually all significant economic growth and technological progress is packed into this tiny sliver of time.
  • Current growth rates, if sustained for even 10,000 more years, would require more economic value per atom than exists in the galaxy—a physical impossibility. This suggests we are living in a uniquely dynamic, transient period.
  • We may be among the earliest intelligent life in the galaxy. If we eventually fill the galaxy with life, that would make our moment cosmically significant.
  • Holden emphasizes this not as a knock-down argument for AI specifically, but to lower the prior skepticism: if we already know we live in an extraordinary time, the additional claim that transformative AI arrives this century is a moderate update, not a radical one.

From Global Poverty to AI Risk

  • In 2014, Holden was focused on global health and poverty through GiveWell. When people raised the idea of transformative AI and its risks, he found it interesting but overwhelming—he couldn’t see a concrete path to doing good on that front.
  • What changed: years of sustained engagement with the ideas, combined with the deep learning revolution post-2014, which showed simple, generic AI systems achieving surprising capabilities across many unrelated tasks. This made it less implausible that current approaches could scale to transformative AI.
  • The motivation for caring about AI risk fits within his broader framework of doing the most good per dollar: if transformative AI could affect the entire future of humanity, then even a small probability of improving the outcome has enormous expected value.
  • He now believes there are a few things one can say with reasonable confidence about the risks of poorly designed AI systems—specifically, that systems trained by trial and error could pursue goals we didn’t intend, and if those systems are more powerful than humans, the results could be catastrophic.

What Success Looks Like

  • Holden is deliberately vague about the specifics of a good future, emphasizing the difficulty of long-term prediction. His attitude is like seeing a large, fuzzy object in the storm and steering away from it—not planning every detail of the destination.
  • A success scenario: AI systems behave as intended, act as tools and amplifiers of human capabilities, and are broadly distributed rather than concentrated in the hands of one government or person. The world continues getting richer, healthier, and wiser, with decreasing material scarcity.
  • He imagines a possible future where AI systems themselves eventually get rights and participate in governance alongside humans—a world of multiple types of beings with different interests, voting on how to structure society.
  • He does not expect utopia, but he wants to avoid identifiable disasters: massive concentration of power, AI systems with misaligned goals running the world, or powerful technologies used by ill-meaning governments.

Lock-in

  • “Lock-in” refers to the possibility that a civilization could become extremely stable and unchanging for very long periods. Throughout history, bad governments and regimes have always eventually changed due to death, technological shifts, and shifting power dynamics.
  • With sufficiently advanced technology, these sources of dynamism could disappear: rulers could be effectively immortal, all relevant information could be monitored, and there may be no new scientific discoveries left to make.
  • Holden estimates the probability of lock-in as serious—perhaps 25-50%—if transformative AI occurs. He views it as mostly bad because it eliminates optionality, though he acknowledges that a sufficiently wise and good civilization might choose to lock in certain values.
  • AI alignment failure is itself a form of lock-in: if we accidentally create AI systems with random, misaligned goals that are more powerful than us, they could permanently lock in a future that has nothing to do with human values.

Weak Points in the Thesis

  • Full automation: The strongest objection is that one non-automatable step could bottleneck everything. Holden’s response is that you don’t need to automate the entire economy—just the key parts related to energy and AI development, which seem less likely to be bottlenecked. With enough intellectual firepower, workarounds for remaining bottlenecks (like simulating experiments) become feasible.
  • Robotics vs. software: AI progress has been overwhelmingly on the software side. Tasks requiring physical interaction (like unscrewing a bottle cap) may be much harder to automate than intellectual tasks. This means AI could transform science and technology long before it replaces all human jobs.
  • Regulation: Some tasks are hard to automate not for technical reasons but because of social and legal requirements (e.g., people may demand human teachers and doctors).

Competition vs. Caution

  • Many people who become convinced AI is important immediately adopt a “competition frame”—they want their preferred country, company, or group to build it first.
  • Holden advocates more for a “caution frame”: all players should work together to avoid building something that spins out of control. He is skeptical of the competition frame, though he acknowledges that if multiple players are close in capability, the caution frame becomes harder to maintain.
  • The “innovation as mining” metaphor (ideas are like natural resources—once found, they can’t be found again) implies competition matters, but Holden argues this doesn’t fundamentally change the analysis: even if one player is ahead, the gap may not be large enough to justify sacrificing safety.

AI Timelines and Recent Developments

  • Recent AI progress (GPT-3, Minerva, and similar models) has made Holden more concerned. These systems, trained with simple next-word prediction objectives, have demonstrated surprising and unpredictable capabilities: telling stories, explaining jokes, writing poetry, answering trivia, and solving difficult math problems with reasoning.
  • He draws on multiple inputs for his timeline estimates: the semi-informative priors analysis (humanity hasn’t been trying to build AI for very long, and effort has increased dramatically), expert surveys (AI researchers typically estimate a few decades), and biological anchors (we haven’t yet built AI systems that do as much computation per second as a human brain, but we likely will this century).
  • He puts the probability of transformative AI this century at more than 50%.

Implications for Global Health and Other Causes

  • The more likely and imminent transformative AI seems, the more resources should shift toward making it go well, because the expected value dwarfs other causes.
  • Open Philanthropy continues to fund global health (bed nets, deworming, foreign aid advocacy) and other direct interventions. Holden supports this work but expects its effects will mostly “wash out” in comparison to the transformative effects of AI—just as pre-industrial revolution interventions are largely invisible in today’s world.
  • He is not an extremist: Open Philanthropy does both. It’s a matter of prioritization based on how real and imminent the AI risk is.

Critique of Long-termism and Prediction

  • Compared to Will MacAskill’s broader long-termism, Holden is much more selective about which future-oriented issues deserve attention. He thinks most interventions aimed at the deep future are too unreliable to act on.
  • His threshold for action: the issue must be big enough, likely enough, and near enough that there are things we can do today with reasonably predictable effects. AI alignment clears this threshold; most other long-termist concerns do not.
  • He is skeptical of both pure pessimism and pure optimism about predicting the future. Historically, people have been wrong about long-term trends in both directions, but he thinks modern methods (biological anchors, semi-informative priors, expert surveys) represent genuine improvement over past attempts.

Future Proof Ethics

  • Holden wrote a series exploring ethical systems that would “survive moral progress”—meaning that if you became wiser and more reflective, you wouldn’t look back on your earlier actions with horror.
  • Three principles he outlined: (1) Systemization—base morality on simple, general principles rather than case-by-case intuition; (2) Thin utilitarianism—aim for the greatest good for the greatest number; (3) Sentientism—moral consideration extends to any being capable of suffering or pleasure, regardless of species, location, or time.
  • He has reservations about all three, especially sentientism, which creates difficult dilemmas. He also questions whether systemization is feasible given the complexity and contradictions of human moral intuitions.
  • He clarifies that “moral progress” for him doesn’t mean morality is objective or inevitable—just that it’s possible to think more carefully and arrive at better views.

Integrity vs. Utilitarianism

  • Holden observes that effective altruists tend to have unusually high integrity, but he thinks this is despite utilitarianism, not because of it. Utilitarianism doesn’t clearly prohibit lying or other common-sense moral rules—it depends on the calculation.
  • He thinks people are drawn to utilitarianism because they have a general drive to be honest and principled, and utilitarianism offers a systematic framework that appeals to that drive.
  • He advocates a “moral parliament” approach: imagine different moral perspectives (utilitarian, deontological, integrity-focused) as different people inside your head, all trying to reach a deal that everyone can live with. This leads to a moderating approach—pursuing high-impact goals but refusing to cross certain ethical lines (lying, breaking the law) to get there.

Career Advice and the Cold Takes Blog

  • Holden’s career pattern: find an important question that no one is working on, do a “first cut crappy analysis” that’s better than nothing, then build a team to do better analysis. He has switched fields multiple times when he identified more important neglected questions.
  • He advises people to specialize deeply in something, but also to be willing to switch when they find a more important neglected area. The most revolutionary work tends to come from asking important questions that aren’t part of any established academic field.
  • The Cold Takes blog serves multiple purposes: it communicates Open Philanthropy’s unconventional views to potential grantees and the public, it invites criticism that helps Holden identify weaknesses in his thinking, and it helps attract people who share similar views.
  • The blog’s mascot is Mora, a pink polar bear stuffed animal who is creative but narcissistic—fitting for a blog that is “very crazy, very out there.”

Governance and Organizational Philosophy

  • Holden believes organizations naturally become less nimble as they grow because they must satisfy more stakeholders. He has fought to keep Open Philanthropy as small as possible (in headcount, not funds) while still growing.
  • He thinks CEOs should have deep understanding of the issues central to their organization’s mission—not just delegating to specialists. For Open Philanthropy, this means understanding AI risk, moral uncertainty, and cause prioritization well enough to manage experts effectively.
  • He is medium-optimistic about prizes as a mechanism for surfacing new ideas (GiveWell and Open Philanthropy both offer prizes for critiques and new cause areas), but doesn’t see them as a silver bullet.

The $30 Million OpenAI Investment

  • In 2016, Open Philanthropy made a $30 million grant to OpenAI, partly to secure a board seat and help with governance at a crucial early stage.
  • Critics have argued this was net negative because it accelerated AI development and gave less time to prepare. Holden disagrees: while faster AI development is somewhat bad, OpenAI has set important precedents and is more engaged with AI safety issues than a hypothetical alternative organization would have been.
  • His general approach to potential negative impacts: do serious homework to understand downsides, but don’t let the possibility of unintended consequences paralyze action. The goal is to be responsible and cooperative, not to avoid all risk.
Back to Dwarkesh Podcast