The 300-Year-Old Physics Mistake No One Noticed — Theories of Everything

This is a wide-ranging conversation with John Norton, philosopher of physics at the University of Pittsburgh, who has spent decades challenging assumptions physicists treat as bedrock. The thread running through his work is epistemic humility: don’t smuggle in a priori convictions (determinism, causation, universal rules of induction) ahead of what the science actually shows. The episode moves through his famous “dome” example, his rejection of causal metaphysics, his deflationary view of thought experiments, his attack on Landauer’s principle, and finally his reading of Einstein’s quantum-theory contributions and what made Einstein’s mind work.

Norton’s Dome and indeterminism in Newtonian physics

The dome arose almost by accident around the late 1980s, while Norton and colleague John Earman taught a seminar on causation and determinism (Earman had written A Primer on Determinism).
- Norton was about to claim that Newtonian systems with finitely many degrees of freedom are always deterministic, then checked for counterexamples before saying it to sharp graduate students.
- He took a standard counterexample to the Lipschitz condition (which guarantees unique solutions to differential equations) and realized it could be physically realized: a specially shaped frictionless dome with a mass at its apex.
The mathematics is only a few lines. Because the dome’s shape makes the relevant equation violate the Lipschitz condition at the apex, the mass can spontaneously begin moving at an arbitrary later time with no cause — a genuinely indeterministic solution.
- The core point: whether a given Newtonian system is deterministic must be discovered, not stipulated. Assuming Newtonian physics is deterministic quietly inserts the very conclusion you claim to derive.
The reaction surprised him. People sent friendly but insistent emails “correcting” him, revealing how deeply the belief in Newtonian determinism is entrenched — treating any deviation as an error that must be found.
Norton sees the dome as ordinary Newtonian physics, comparable to standard idealizations like a particle sliding off a sharp table edge (which has an even worse singularity — in the tangent rather than just the curvature). He’s puzzled that some find it trivial and others deeply troubling.
The dome can’t be physically realized (it requires violating quantum mechanics — exact placement, exact rest, ideal surface), but that’s true of any Newtonian analysis.

The danger of infinite limits

The more serious case of indeterminism appears with infinitely many interacting masses, where Newtonian dynamics becomes generically indeterministic.
This matters for the thermodynamic limit. Modeling a crystal as mass points joined by springs, one expects thermodynamic behavior (e.g. the Boltzmann distribution) to emerge as the lattice grows.
- If “taking the infinite limit” means considering arbitrarily large but always finite lattices, the sequence stabilizes into well-behaved thermodynamics.
- If it means literally analyzing an actually infinite lattice, the dynamics become indeterministic. Norton’s lesson (from his 2011 paper “Approximation and Idealization”): be very careful how you take infinite limits.
He notes the broader moral applies across physics: when quantum mechanics appeared in the mid-1920s and proved indeterministic, the cry that “causality is lost” was really an artifact of 19th-century thinking that had identified causation with determinism. Nothing fundamental was lost — they had simply learned something new.

Why causal metaphysics fails

Norton distinguishes two things: he’s comfortable with ordinary causal talk in science (voltages drive currents, free-energy gradients drive thermodynamic effects), but rejects causal metaphysics.
Causal metaphysicians try to settle, prior to and independent of empirical science, what causation “really is,” then assign scientists the cleanup job of showing how that principle is instantiated.
- This enterprise has thousands of years of failure: no proposed principle of causation has both empirical content and success in the world. It amounts to doing physics a priori, which never works because the world is more creative than our imaginations.
Causal language in science is, on Norton’s view, a kind of useful labeling — a “veil definition.” When Einstein says stimulated emission is “caused” by the radiation field, he’s declaring how he intends to use a word, not discovering a deep metaphysical truth.
Such language is genuinely useful: it carries practical/predictive content. Jim Woodward’s interventionist account (a causal relation holds when intervening on one variable changes another) is, for Norton, just a valuable definition — if X causes Y, you know intervening on X affects Y.
He surveyed where “causation” appears in physics and found it nearly always means one of two things: the light-cone structure of spacetime, or the confinement of physical propagations to within the light cone. There’s no hidden loss in abandoning causal metaphysics — you never had the metaphysical thing to begin with.

Thought experiments are just arguments

The decades-long debate (since the 1980s) has two poles: a deflationary view (thought experiments are ordinary argumentation, just picturesque) and an inflationary view that they tap some special power.
- The sharpest inflationary version is James Robert Brown’s Platonism: a great thought experiment opens a window onto Plato’s heaven, letting us literally perceive laws of nature — supported by the “aha” moment of sudden understanding.
Norton holds the deflationary view: every cogent thought experiment is at bottom an argument. The vivid mental picture makes it compelling and easy to run through, but pictures alone prove nothing.
- Example: imagining a perpetual motion machine in vivid detail proves nothing; only an underlying argument gives a thought experiment force.
- When someone walks him through a thought experiment (e.g. Galileo’s falling bodies — a heavy bag of marbles vs. a single marble, where treating them as connected forces the conclusion that all fall at the same rate), they are simply running an argument.
Against Brown, Norton notes that even the “instant” perception of a result (Brown likes summing 1+2+…+n by looking at stacked blocks) dissolves into an ordinary argument the moment Brown explains how it works. The “moment of understanding” doesn’t surpass argumentation.
He allows that thought-experiment arguments can be inductive, not just deductive — and that this is where they can be powerful but also risky (see Einstein’s equivalence principle below).

The case against Landauer’s principle

Landauer’s practical question: how far can we reduce heat generation in computing? Heat is degraded work and a real cost (he recalls Cray computers cooled in Freon).
Landauer’s principle (as developed by Charles Bennett) ties minimum heat generation to the logic of the operation: logically reversible operations (like a bit flip) can in principle be done with negligible heat, while logically irreversible operations (erasure) must dissipate a minimum entropy — famously k log 2 (~0.69k) per erased bit.
Norton’s objection rests on a basic fact of molecular-scale thermodynamics: nothing at molecular scales can be done without creating entropy.
- Even a bit flip requires a driving force to push the system from one state to another, working against the system’s own thermal agitation (it’s bouncing around with its own kT energy).
- There’s a tradeoff captured by Boltzmann’s S = k log W: confine a charge weakly and you create little entropy but have a high probability the process fails (the charge jumps back); confine it strongly for a high success probability and you create a lot of entropy.
The correct determinant of minimum heat is therefore not the logic but two physical quantities: how many steps the process has, and the probability of completion you demand for each step. Computing devices chain many steps, every one dissipative.
- Concretely, getting modest success probabilities for a single step already exceeds k log 2 — he recalls ~3k of entropy for roughly 95% success — and that’s just one step.
Practical upshot: to minimize heat in computers, focus on the number of steps and required completion probabilities and the physical implementation, not on minimizing logical irreversibility.
He argues the whole literature also clears up Maxwell’s demon more simply. Sitting on a bus, he realized Liouville’s theorem straightforwardly prohibits a classical Maxwell demon, and a quantum analogue of the theorem prohibits the quantum case (he later wrote a paper laying the two side by side). The information/computation framing distracted decades of research from this cleaner result, and explains why no nanoscale Maxwell demon has ever been built.

Szilard engines, fluctuations, and selective idealization

The deeper error traces to Szilard’s 1929 single-molecule engine (a Maxwell demon variant): a one-molecule gas, partitioned and isothermally expanded, seems to convert ambient heat into work.
The real subject is thermal fluctuations and whether they can reverse the second law — the same tradition as Einstein, Brownian motion, and Poincaré’s remark that under a microscope we see a Maxwell demon in action.
- Smoluchowski’s answer (early 20th century, e.g. the trapdoor): you can find exploitable fluctuations, but any mechanism exploiting them has its own fluctuations that reverse the gains.
Norton’s charge: analyses of the Szilard engine, from Szilard onward, account for the fluctuations they want (the gas molecule) while ignoring others they must suppress (the partition itself, which carries ½kT of thermal energy and requires either suppression if light or friction/damping if heavy — both entropy-generating).
- This is an inconsistent use of “in principle” idealization: idealizing away half the fluctuations while keeping the other half, then claiming a result. Treated consistently, the apparent violations vanish.
A recurring fallacy: whenever a P log P expression appears, people assume it must be thermodynamic entropy. But a P log P only corresponds to thermodynamic (Clausius) entropy when the probabilities arise in a specific physical way. Not knowing which way a coin in your pocket faces does not give it k log 2 of thermodynamic entropy.
On the Nature experiment claiming to validate Landauer’s principle: it compressed a colloidal Brownian particle (effectively a one-molecule gas) reversibly and measured ~kT log 2 of heat passed to the environment. Norton says this is just standard ideal-gas thermodynamics — known since well before Landauer and implicit in Einstein’s Brownian-motion work. It had to come out that way; if it hadn’t, statistical physics would be in crisis. The experiment ignores all the entropy created in the rest of the apparatus, so it doesn’t establish a logic-based lower bound at all.
He connects this to a paper he’s writing on the loose use of “in principle”: people rarely specify which modality (epistemic, nomological, metaphysical, logical) they mean, and Landauer reasoning abuses the idealizing sense.

What entropy actually measures

Thermodynamic entropy (Clausius, ~1865) tells you which processes spontaneously move forward.
- Boltzmann’s S = k log W fits this directly: a process advances when the end state has higher probability than the start, hence higher entropy.
- In the Gibbs formalism, the Gibbs (P log P) entropy connects to thermodynamic entropy via analysis of reversible processes (given by both Gibbs and Einstein in early work).
Shannon entropy is a different thing: a parameter of a probability distribution measuring how uniform it is (maximal for a uniform distribution, lower as the distribution peaks). There are connections, but Norton declines to nest one cleanly inside another.
He treats the Clausius, Boltzmann, and Gibbs notions as fitting together nicely, but defers on the quantum von Neumann entropy: interpreting the density operators in quantum statistical mechanics walks straight into the measurement problem, and he doesn’t think anyone handles it other than pragmatically.

Einstein’s quantum contributions and his break with the new theory

Einstein’s well-known objections to the new quantum mechanics: he rejected its indeterminism and sought a hidden-variable theory — but not Bohm-style. He hoped his unified field theory would recover the hidden variables. The EPR argument (with its criterion of reality) was meant to show the quantum description is incomplete.
A historical digression on Einstein’s style: he disliked the geometric approach to general relativity, preferring an algebraic/analytic one. As a result he (and figures like Hilbert, Felix Klein, Hermann Weyl) regarded the Schwarzschild radius as a genuine singularity until about 1950 — even though Lemaître had shown it could be transformed away. Einstein dismissed geometric pictures (e.g. embedding a 3-sphere in 4D Euclidean space to get his 1917 cosmology) as a mere “donkey bridge” — a learning aid not to be taken literally (akin to Dennett’s “intuition pumps”).
Einstein’s contributions to the old quantum theory were massive, the central one being the 1905 light quantum.
- Norton frames Einstein’s “miracle year” as the completion of 19th-century physics: Brownian motion completed Maxwell–Boltzmann statistical physics and established the reality of atoms; special relativity excavated the kinematics of space and time already implicit in Maxwell–Lorentz electrodynamics (the Lorentz group, sharpened by Poincaré).
- Einstein’s distinctive genius was reading significance in empirical results others couldn’t see, using only pen, paper, and journals.
The light quantum’s logic lived in Einstein’s thermodynamics work: molecular constitution leaves a signature on macroscopic thermodynamic properties.
- The ideal gas law (PV = NkT) is the macroscopic fingerprint of independently moving localized particles — which is also why dilute salt solutions exert osmotic pressure obeying the same law.
- Examining heat radiation in the Wien (high-frequency) regime, Einstein found its entropy varies with the logarithm of volume — the same fingerprint. Using S = k log W (which he called Boltzmann’s principle but largely derived himself), he inferred radiation is made of localized energy bundles of size hν. Norton calls this one of Einstein’s most beautiful achievements.
- Extending the analysis across the full spectrum via fluctuations of radiation pressure/energy, Einstein found an expression that is the sum of a particle-like term and a wave-like term — the origin of wave–particle duality.
The light quantum was resisted for ~20 years (Bohr disliked it) until the Compton effect finally convinced physicists. Einstein’s later contributions included the A and B coefficients (1916–17, basis of the laser) and Bose–Einstein statistics (early 1920s).

How discovery actually happens, and advice for researchers

The “youth and genius” link is correlation, not causation. New sciences are where new discoveries are available; established figures stay in their old fields, so newcomers — who tend to be young — naturally make the breakthroughs. So neither youth nor age determines whether you can do great work.
Norton’s own practice: a person’s best novel ideas in a field tend to come early, so he deliberately keeps jumping between fields (indeterminism, induction, thought experiments, Einstein scholarship, empiricism). He argues philosophers of physics can switch topics freely because they’re supported by teaching, whereas physicists are trapped by grants, labs, and required expertise.
The deepest lesson from Einstein: match the available, ripe problems to your particular talent.
- Einstein could read physical meaning out of empirical results (the atomic signature in radiation, the kinematics in the Lorentz group, the exactness of the equivalence of fall rates). The breakdown of universal free-fall at order v²/c² in early relativistic gravity theories bothered him so much he built general relativity to preserve it, moving from Minkowski to semi-Riemannian spacetime.
- But after the mid-1920s, pursuing the unified field theory by seeking the “simplest possible rules,” he stopped using that facility and produced nothing that works — he was no longer matched to the problem.
- Bohr is the mirror image: his tolerance for crazy contradictions made the 1913 atomic model possible (just switch off electrodynamics so electrons orbit without radiating), but later turned into the incoherent doctrine of complementarity, which Norton thinks Bohr wrongly persuaded a generation to take seriously.
His practical advice: do the work that interests you, and find where you can see further than others. Your real talent shows up when a group gets tangled in something that seems perfectly clear and obvious to you — but because it feels easy, you undervalue it and instead envy people doing flashy things you can’t do. Do what you see through clearly and fast. (He notes his own engineering background lets him thrive in vagueness, e.g. his work clarifying thermodynamically reversible processes.)

Learning a subject from many angles

Norton learned thermodynamics “from scratch” four separate times in his chemical engineering training — in physics, engineering, chemistry, and chemical-engineering departments — each with different representations (quasistatic processes; fugacities and chemical reactions; engine efficiencies like the Otto cycle). Only the third time did it click that the key concept is the thermodynamically reversible process.
- A common physicist’s error: thinking a reversible process is just a slow one. A balloon deflating slowly through a pinhole is slow but irreversibly entropy-creating. A genuinely reversible process requires near-perfect balance of driving and opposing forces — yet if they balanced exactly, nothing would happen, so a tiny imbalance (and tiny entropy creation) is unavoidable. Resolving this tension is the subject of his paper.
He suspects this “learn it four times” pattern is general (it’s said of quantum field theory too). Thermodynamics is intrinsically beautiful but radically incomplete: it always needs an additional theory of the matter involved (fluid mechanics, the quantum mechanics of the system, etc., as in computing the max efficiency of solar cells or Peltier junctions, which are heat engines).
- The unifying idea: you approach one phenomenon with many theoretical devices, and only when you see how they all bear down do you grasp the commonality (like inferring a cone’s 3D shape from its different 2D projections).
A parallel point about creativity in philosophy: the hard, creative part is usually framing the question so an analysis becomes possible, not the answer. Once you pose the right question sharply (Norton did this with the “epistemic problem” of thought experiments, and with causation), the answer often looks easy — and you got there first.

The material theory of induction

Norton wants to say good science is privileged because it is well-supported by evidence, and that support is inductive — but he found no existing account of induction that delivers this. Instead the literature fragments into many accounts, encouraging “doctor shopping”: pick the example, then find the inductive rule that fits.
His conclusion: there are no universal rules of inductive inference. Instead there are local inductive systems, each warranted by facts.
- Example: in 1903 Marie Curie examined the only sample of radium chloride in the world and confidently declared its crystallographic form (matching barium chloride). This can’t be enumerative induction (“this A is B, so all A are B”) — almost every such generalization fails (her sample was a tenth of a gram, in Paris, etc.).
- Her confidence was warranted by hard-won 19th-century facts about crystals: lattices fall into one of six or seven families (work tied to the theory of finite discrete groups, e.g. Haüy’s principle). That factual structure licenses the generalization — though it remains slightly risky because some substances are polymorphic (like carbon as diamond vs. graphite).
The same applies to probabilistic inference: there is no default entitlement to represent uncertainty by probability. You bear a positive obligation to show a probabilistic representation is appropriate.
- In DNA-typing forensics, the probabilities are legitimate only because they’re anchored by the fact that the suspect can be treated as randomly sampled from the population. If that anchoring fails (e.g. planted evidence), all bets are off.

Why the simulation argument fails

The simulation argument is, for Norton, a spectacular case of using probabilities with no factual grounding.
- Even granting (shakily) that there are vastly more ways our experiences could arise if we’re simulations than if the world is real, the right stopping point is: we have no idea which case is ours.
- The fallacy is the next step — representing that total ignorance as a probability distribution, which dumps most of the mass onto “simulation.” That probability “falls from the sky”; the conclusion is an artifact of misapplied inductive logic.
- Invoking the principle of indifference doesn’t rescue it: in cases of genuine, extreme ignorance the principle of indifference contradicts the axioms of probability (classic examples go back to Keynes in the early 1920s). A loaded die with two faces colored blue and the rest red shows the indifference move is illegitimate.

Summary

Norton’s Dome and indeterminism in Newtonian physics

The danger of infinite limits

Why causal metaphysics fails

Thought experiments are just arguments

The case against Landauer’s principle

Szilard engines, fluctuations, and selective idealization

What entropy actually measures

Einstein’s quantum contributions and his break with the new theory

How discovery actually happens, and advice for researchers

Learning a subject from many angles

The material theory of induction

Why the simulation argument fails