Adam Marblestone – AI is missing something fundamental about the brain

Dwarkesh Podcast 1h49 7 min #109
Adam Marblestone – AI is missing something fundamental about the brain
Watch on YouTube

Summary

  • The central question: Why are human brains so much more capable than today’s AI despite receiving far less training data? Adam Marblestone argues the answer lies not in neural architecture or learning algorithms, but in the brain’s reward functions — the specific cost functions and learning curricula that evolution has encoded over millions of years. This reframing has implications for how we build AGI, how we understand neuroscience, and where we should invest billions of dollars in research infrastructure.

The brain’s secret sauce is in the reward functions, not the architecture

  • The standard ML framework breaks intelligence into architecture, learning algorithm, initialization, and loss/cost functions. Marblestone’s core hypothesis is that the field has neglected the loss functions — evolution may have built enormous complexity into them, with many different loss functions for different brain areas turned on at different developmental stages, essentially a learned curriculum encoded in the genome.
  • Cortex as omnidirectional inference engine: Rather than predicting only the next token (like an LLM), any area of cortex may natively predict any subset of its inputs from any other subset — “omnidirectional inference.” This is closer to energy-based models (Yann LeCun’s framework) where you clamp any variables and sample from any others.
  • The Steering Subsystem vs. Learning Subsystem (Steve Byrnes’ framework):
    • The Steering Subsystem consists of subcortical areas (hypothalamus, brainstem, superior colliculus) with innate responses, reward functions, and even its own primitive sensory systems (e.g., face detection, threat detection).
    • The Learning Subsystem (cortex, parts of amygdala) learns to model and predict the Steering Subsystem’s responses. This is how evolution wires up abstract learned concepts (like “Yann LeCun is upset with me”) to primitive reward signals (shame, embarrassment) without ever having seen those concepts.
    • Thought Assessors: For every important Steering Subsystem variable (am I about to flinch? am I talking to a friend?), there is a predictor trained in the cortex. The neurons that matter for social status are the ones that predict the innate heuristics for social status. This gives the reward function generalization — the word “spider” can trigger the spider reflex even without a real spider.
  • Evidence from cell types: Single-cell atlases show the Steering Subsystem has far more diverse, bespoke cell types than the relatively uniform cortex. This suggests the genome invests heavily in specifying reward circuitry (like Python code for each innate behavior) rather than in the learning architecture itself.
  • Why so little genomic information builds so much intelligence: If evolution’s main contribution is compact reward functions (each just a few lines of “code”) plus a basic learning algorithm architecture, then 3 gigabytes of genome is enough. The heavy lifting is done by lifetime learning guided by those reward functions.

What the genome actually encodes

  • Amortized inference: Neural networks amortize the computationally intractable Bayesian inference problem (what cause best explains this observation?) into a fast forward pass. Test-time compute (chain of thought) can be seen as re-doing some of that sampling. Over time, capabilities that required test-time compute get distilled into the base model.
  • Evolution’s tradeoff: Because each human must be built from scratch via the genome, evolution amortized less into the Learning Subsystem’s initial weights and more into innate bootstrapping cost functions and behaviors. Digital minds that can be copied have different tradeoffs — they can amortize more.
  • Scaling of cortex vs. reward functions: The hominid brain expanded rapidly not because the cortex fundamentally changed, but because social learning increased the returns to having a bigger cortex. A relatively small number of genes could expand cortical scaling; the bigger evolutionary innovation was wiring social instincts (eye contact, language, status-seeking) into the Steering Subsystem to make use of that cortex.
  • Language may require few genetic changes: Broca’s and Wernicke’s areas connecting to hippocampus and prefrontal cortex might be enabled by a small number of macro-wiring genes. The cortex’s potential for language-like processing may have already existed; what changed was the incentive and wiring to use it.

What kind of RL is the brain doing?

  • Multiple RL systems coexist:
    • Basal ganglia/striatum: Model-free RL with a finite action space (motor actions, gating signals between cortical areas, hippocampal memory release). This is conceptually similar to simple temporal difference learning.
    • Dopamine as reward prediction error: Consistent with TD learning — dopamine signals prediction error, not just raw reward, which is evidence for value function learning in the brain.
    • Cortex: Model-based RL. The cortical world model includes predictions about when rewards will occur, what plans lead to reward, and can even do “RL as inference” — clamping high reward and sampling plans that could lead to it.
  • Civilizational-level RL: Cultural transmission of complex knowledge (how to process poisonous beans, how to hunt seals) resembles model-free RL at a civilizational level — trial and error across generations, with culture storing some of the model.

Is biological hardware a limitation or an advantage?

  • Key disadvantage: The brain cannot be copied or randomly accessed — no external read-write to every synapse. This is a fundamental constraint that digital minds don’t face.
  • Key advantages: Extreme energy efficiency (20 watts, 200 Hz), co-located memory and compute, natural stochasticity for sampling-based inference, and potentially greater “cognitive dexterity” through unstructured sparsity.
  • Co-design principle: The brain’s algorithm is co-designed with its hardware constraints (slow, low-voltage switches). Future AI hardware should similarly co-locate memory and compute, use lower voltages, and embrace stochasticity.
  • Cellular complexity: Much of what happens inside neurons (beyond synaptic connections) may be implementation machinery for algorithms that are simple to specify in code but require molecular machines to execute in cells. However, some cellular mechanisms (e.g., cerebellar timing cells storing time delays) may play genuine computational roles.

The paperclip maximizer and alignment implications

  • Minimal Steering Subsystem for capabilities vs. alignment: The minimum set of reward functions needed to get a system to learn effectively (curiosity, social interest) may be far smaller than the set needed for human-like ethics and social instincts. This means a superintelligent paperclip maximizer is plausible — it could have the drives needed to learn physics and build spaceships without having human-like moral instincts.
  • LLMs already show you can learn language without human instincts: This suggests the capability-reward-functions and alignment-reward-functions are at least partially separable.

Do we have the right conceptual vocabulary?

  • The history of neuroscience→AI transfer: Most major AI ideas inspired by neuroscience (neurons, backprop, CNNs, TD learning, actor-critic) were developed in AI first and then found to have brain correlates. This pattern may continue.
  • The reverse view (György Buzsáki): Our AI-inspired vocabulary (backprop, value functions) may be entirely wrong for describing the brain. We may need to start from the brain’s own primitives (oscillations, dynamics) and build new vocabulary.
  • Marblestone’s approach: Pursue both — bottom-up biophysical simulation of simple organisms (worm, zebrafish) to discover new principles, AND reverse-engineering using AI-inspired vocabulary as hypotheses to test against connectome data.

Why we need to connectome-map the brain

  • What a connectome provides: Not the ability to read out thoughts like “Golden Gate Bridge,” but constraints on the architecture, learning rules, and initialization — the same kind of description we use for LLMs (architecture + loss function + training data + initialization).
  • Molecularly annotated connectomes: New optical microscopy approaches (E11 Bio) can identify not just who connects to whom, but what molecules are present at each synapse, enabling cell-type classification and wiring rule inference.
  • Cost trajectory: A mouse brain connectome could go from billions to tens of millions of dollars with new technology. A human brain (1,000× larger) would still be billions, but mapping the human Steering Subsystem specifically is more feasible.
  • Timeline: Marblestone estimates transformative AGI is more than 5 years away, probably ~10 years. In that world, having connectomes and understanding Steering Subsystem architecture across species would meaningfully inform AI development and alignment.
  • Funding: Hundreds of millions to low billions of dollars in concerted philanthropic and government funding could achieve this. Compared to trillions in GPU spending, this is a rational investment.
  • The Human Genome Project analogy: The $3 billion first genome enabled the technology that now sequences genomes for hundreds of dollars. Similarly, funding the first connectome properly will drive down costs for all subsequent ones.

Brain-data-augmented AI training

  • Behavior cloning on brain data: Beyond training AI on labels (cat/dog), add an auxiliary loss function that predicts brain activity patterns associated with those labels. This could sculpt AI representations to be more brain-like, potentially improving generalization and robustness (e.g., adversarial examples).
  • The brain already does this: The Learning Subsystem already predicts the Steering Subsystem as an auxiliary task — this is essentially what the Thought Assessors are doing.
  • The bottleneck is technology, not theory: If every iPhone were a brain scanner, we could train AI on brain signals. We got GPUs before portable brain scanners.

Automating mathematics with formal verification

  • Lean and formal math: Lean is a programming language where mathematical proofs can be expressed and mechanically verified. This creates a perfect RLVR (reinforcement learning from verifiable rewards) task — the proof either checks or it doesn’t.
  • What this automates: Finding proofs, verifying lemmas, checking that theorem statements across papers are equivalent. This will accelerate math significantly and enable provably secure software and hardware.
  • What this doesn’t automate (yet): Conjecturing new interesting theorems, conceptual reorganization of math, high-level strategy for proofs. Whether there exists a loss function for “good explanations” (compact statements with many implications, like Kolmogorov complexity of mathematical knowledge) is an open research question.
  • The spec problem: For software verification, the bottleneck is writing formal specifications of what properties you want. Engineers know what code should do but not how to formally specify security properties. AI-assisted spec generation could unlock this.
  • Accessibility: Formal verification could let outsiders contribute to advanced math (the way Steve Byrnes synthesizes neuroscience as an outsider), potentially accelerating fields like string theory.

The Gap Map and scientific infrastructure

  • The Gap Map: A catalog of hundreds of “mini Hubble Space Telescope” projects across science — infrastructure gaps where an organized engineering team could unblock an entire field. Total cost across all gaps is in the low billions.
  • Surprising finding: Even pure math, which seems to need only whiteboards, actually needs infrastructure (Lean, formal verification tools). Nearly every scientific domain is missing scalable infrastructure.
  • FROs (Focused Research Organizations): Nonprofit, startup-like moonshot teams funded by philanthropy and government to build these infrastructure pieces — E11 Bio for connectomics, Lean for formal math, and others.
Back to Dwarkesh Podcast