Terence Tao – How the world’s top mathematician uses AI

Dwarkesh Podcast 1h23 8 min #114
Terence Tao – How the world’s top mathematician uses AI
Watch on YouTube

Summary

  • Terence Tao and the host discuss how AI is reshaping mathematics and science, using the history of Kepler’s discovery of planetary motion as a framework for understanding the evolving role of hypothesis generation, data, and verification in the age of large language models.

Kepler as a proto-data scientist and the analogy to LLMs

  • Kepler’s path to his three laws of planetary motion illustrates the interplay between creative hypothesis generation and rigorous empirical verification.
    • He began with an elegant but wrong theory—that the six known planets’ orbits were separated by the five Platonic solids—motivated by a belief in geometric perfection.
    • He needed Tycho Brahe’s dataset, the most precise astronomical observations ever collected (decades of naked-eye measurements), to test his ideas. He eventually took the data after a dispute.
    • After years of failed attempts to make his geometric models fit, he discovered through painstaking data analysis that planetary orbits are ellipses (first law), that equal areas are swept in equal times (second law), and a decade later, that orbital period relates to distance from the Sun via a cube-square relationship (third law).
    • Kepler had no theoretical explanation for why these laws held; Newton provided that a century later.
  • The host proposes that Kepler functioned like a “high-temperature LLM”—cycling through many random, often nonsensical hypotheses (including astrological harmonies) against a verifiable dataset, with one eventually producing a real empirical regularity.
    • Kepler’s third law was essentially a regression fit to just six data points (one per known planet), which was lucky—a later similar attempt (Bode’s law) made successful predictions for Uranus and Ceres but failed for Neptune, revealing it as a numerical fluke.
    • The key insight: as long as there is a reliable verification mechanism, even noisy, random hypothesis generation can produce genuine scientific progress.
  • Terence agrees that Kepler was an early data scientist, but notes he started with preconceived theories before getting the data—unlike modern machine learning, which can start with data and extract patterns without prior hypotheses.
    • The host argues the modern paradigm has reversed the classical scientific method: now we collect big data first, then extract hypotheses from it.

The bottleneck is shifting from idea generation to verification

  • AI has driven the cost of idea generation close to zero, analogous to how the internet drove the cost of communication to zero.
    • The bottleneck in science is no longer generating hypotheses—it is verifying, evaluating, and identifying which ideas constitute real progress.
    • Human peer review systems, designed to filter signal from noise, are already being overwhelmed by AI-generated submissions flooding journals.
  • A critical challenge: among millions of AI-generated papers, how do you identify the next unifying concept (like Shannon’s “bit” at Bell Labs) that has implications across many fields?
    • Many great ideas were not recognized as such initially (e.g., deep learning was niche and controversial for years before bearing fruit).
    • Assessing whether an idea is fruitful depends on future developments, cultural adoption, and context—it cannot be objectively scored in isolation and may never be amenable to reinforcement learning.
  • Often, the ultimately correct theory initially looks worse than the established but wrong one.
    • Copernicus’s heliocentric model was simpler but less accurate than Ptolemy’s geocentric model; it took Kepler’s ellipses to surpass it.
    • Newton’s theory left major mysteries (action at a distance, equivalence of inertial and gravitational mass) unresolved until Einstein.
    • Progress sometimes requires deleting assumptions (e.g., the Aristotelian idea that objects naturally rest) rather than adding new theories.
  • The host draws a parallel to Darwin: the evidence for natural selection is cumulative and retrospective, making it harder to verify in a tight loop than Newton’s equations, which could be immediately checked against orbital data. This may explain why some fields advance faster than others.

The role of communication and narrative in science

  • Scientific progress depends not just on having the right theory but on communicating it persuasively.
    • Darwin wrote in plain English, synthesized disparate facts into a compelling narrative, and made testable predictions about what would be found later (transitional forms, mechanisms of inheritance).
    • Newton wrote in Latin, invented new mathematics to explain his work, and was secretive and competitive, holding back insights from rivals.
    • It took decades for other scientists to explain Newton’s work in simpler terms before it became widespread.
  • Persuasion and narrative construction are difficult to formalize or optimize via reinforcement learning—this may remain a permanently human aspect of science.

AI’s current impact on mathematics: impressive but limited

  • AI has solved roughly 50 of the ~1,100 Erdős problems, but progress has plateaued as the “low-hanging fruit” has been picked.
    • Most solved problems had essentially no existing literature; the AI combined an obscure known technique with another result in a way no one had written down.
    • On any given problem, AI tools have roughly a 1–2% success rate; the impressive headline numbers come from running them at massive scale across many problems and selecting winners.
    • There is significant selection bias in reported AI successes—companies and researchers publicize wins but not failures.
  • Terence uses a mountain-climbing analogy: AI tools are like jumping machines that can leap higher than any human, reaching the tops of the lowest walls, but they cannot do the incremental hill-climbing, marker-setting, and partial progress that characterizes human mathematical work.
    • They succeed or fail in one shot; they are bad at identifying intermediate stages or creating partial progress.
  • AI excels at breadth; humans excel at depth. The two are complementary.
    • Once AI reaches a given capability level, it can be replicated millions of times and applied across vast problem sets—something impossible for human experts.
    • The future of science involves using broadly competent AI to map out entire fields, identify easy results and islands of difficulty, and then deploy human experts on the hard problems.
    • Mathematics needs to be redesigned to take advantage of this breadth capability, which is entirely new.

AI makes papers richer and broader, but not deeper

  • Terence estimates AI makes him roughly 2–5× more productive, but this is hard to measure because the nature of his work is changing.
    • AI handles auxiliary tasks: generating plots, reformatting LaTeX, conducting deeper literature searches, supplying numerical data—tasks that previously took hours and that he often would have skipped entirely.
    • The core creative work—solving the hardest parts of a math problem—still happens with pen and paper and has not been significantly accelerated.
    • His papers are now richer and broader (more code, more figures, more numerics) but not necessarily deeper.
  • AI is currently better at applying existing techniques to new problems than at inventing genuinely new techniques.
    • It can try all standard approaches on a problem, often with fewer errors than humans, but struggles when none of the standard methods work and a fundamentally new idea is needed.
    • Top math journal papers typically involve existing methods solving ~80% of a problem, with a new technique required for the remaining resistant 20%—AI has not yet demonstrated the ability to take that last step.

Artificial cleverness vs. artificial intelligence

  • Terence distinguishes between artificial cleverness (what current AIs do) and genuine intelligence.
    • Real intelligence involves adaptive, cumulative collaboration: testing an idea, finding it partially works, modifying it, building on handholds, and systematically mapping what does and doesn’t work through an evolving discussion.
    • Current AIs operate through brute-force trial and error—jumping and failing repeatedly without building cumulatively from partial progress.
    • Each AI session starts fresh; it has no memory of what it just learned and cannot build skills across related problems. Any learning is absorbed only indirectly, if at all, in future training runs.

Could AI solve the Riemann hypothesis without human understanding?

  • Some problems (like the four color theorem) have been solved by brute-force case analysis with no conceptually elegant proof found—and perhaps none exists.
    • The Riemann hypothesis is widely believed to require genuinely new mathematics or a new connection between distant areas of math; it does not feel like a problem solvable by exhaustive case checking.
    • An unlikely alternative: the hypothesis could be false, and a massive computer calculation could find a zero off the critical line—this would be disappointing but would settle the question.
  • Terence is not overly concerned about an incomprehensible AI-generated proof.
    • Formal proof systems like Lean allow every step of a proof to be studied atomically—individual lemmas can be isolated and assessed for whether they represent genuinely new ideas or just boilerplate.
    • Post-processing tools (AI summarization, human rewriting, ablation studies, reinforcement learning for elegance) can deconstruct and interpret even a massive, messy proof.
    • The Erdős problem website already demonstrates this pattern: an AI generates a 3,000-line Lean proof, then other AIs and humans summarize, refactor, and rewrite it.
  • The process of writing papers is changing: writing used to be the most time-consuming part, so results were only written up once fully verified. Now, once one version exists, hundreds of variants can be generated easily.

The need for a semi-formal language for mathematical strategies

  • Lean and similar systems formalize deductive proofs, but there is no formal or semi-formal framework for mathematical strategies—the heuristic, narrative, plausibility-driven reasoning scientists actually use.
    • Example: Gauss computed the first ~100,000 primes and observed a statistical pattern (primes thin out proportionally to the natural logarithm), conjecturing the prime number theorem—a revolutionary, data-driven, statistical conjecture that took over a century to prove.
    • This led to the “random model of the primes,” a heuristic, non-rigorous but extremely accurate conceptual framework that underpins modern number theory and the belief in the Riemann hypothesis.
    • Such plausibility reasoning—combining data, narrative, and argument—cannot currently be formalized in a way that AI can be trained on without exploitable backdoors.
  • Terence suggests that creating “mini-universes” or simulations where AIs solve simple problems and develop their own strategies could help formalize how scientific progress and strategy selection work.

How Terence learns new fields and uses his time

  • Terence describes himself as a “fox” (broad generalist) rather than a “hedgehog” (narrow specialist), driven by an obsessive completionist streak—he is bothered when he sees someone else solve a problem using methods he doesn’t understand.
    • He learns through collaboration (working with experts in other fields who teach him their tricks) and through writing blog posts, which serve as a personal record to prevent him from losing insights over time.
    • Blog posts take anywhere from 30 minutes to several hours and are written voluntarily, often as a break from administrative tasks.
  • On optimal use of his time: he believes strongly in serendipity and deliberately leaves portions of his day unscheduled.
    • Over-optimization (e.g., the shift to fully scheduled remote meetings during COVID) eliminates casual, unplanned interactions—knocking on a hallway door, browsing a physical library journal—that often lead to the most valuable discoveries.
    • He finds that too much focused, distraction-free time (e.g., at the Institute for Advanced Study) eventually leads to boredom and diminished inspiration; a certain level of distraction and randomness is necessary.

When will AI dominate mathematics?

  • Terence believes human-AI hybrids will dominate mathematics for a long time, not pure AI replacement.
    • Current AIs are very good at some tasks and terrible at others; they are complementary to humans, not replacements.
    • Within a decade, much of what math students currently do and much of the content in today’s papers will be automatable by AI—but that work may turn out not to have been the most important part.
    • Historical analogy: 19th-century mathematicians spent careers solving differential equations by hand; now a computer does it in minutes. The field moved on to different problems. Similarly, genome sequencing went from a PhD’s entire career to a $1,000 commercial service, but genetics did not die—it scaled up.
  • He is uncertain when a Millennium Prize Problem will be solved with 95% confidence by an autonomous AI, but expects it will happen eventually.
    • The world is currently very unpredictable; AI may accelerate progress in some ways while potentially destroying serendipity in others.

Advice for aspiring mathematicians

  • Embrace change: the way mathematics is done will transform significantly, and some currently studied topics may become obsolete while new opportunities emerge.
    • Non-traditional paths into mathematical research are opening up—high school students can now contribute to the frontier using AI tools and Lean.
    • Traditional education and credentials remain important for now, but one should also be open to entirely new ways of doing science that don’t yet exist.
    • Maintain an adaptable mindset, pursue curiosity-driven exploration, and be prepared for a scary but exciting era.
Back to Dwarkesh Podcast