Language Without Meaning: How LLMs Exposed Our Biggest Illusion

Theories of Everything 2h25 10 min #47
Language Without Meaning: How LLMs Exposed Our Biggest Illusion
Watch on YouTube

Summary

  • This episode is a long conversation with Professor Elan Barenholtz (Florida Atlantic University), who argues that large language models have accidentally revealed the true nature of human language: that words don’t actually point to things in the world. His “grounded” thesis is that language is a self-contained, self-generating system running on the same core mathematical principle as LLMs (predicting the next token), and his more speculative extensions reach into how this separates our rational linguistic minds from a more unified, “animal” way of experiencing the cosmos. The arc moves from a technical account of how LLMs work, through what they imply about meaning and consciousness, to a near-mystical conclusion about language fracturing our connection to reality.

The grounded thesis: language as an autonomous, self-generating system

  • LLMs like ChatGPT are trained only to do next-token prediction: given a sequence of words, guess the most probable next word, append it, feed it back, and repeat. This simple recipe turns out to be sufficient to generate human-level language across essentially all benchmarks.
    • Barenholtz stresses that it was not a given this would work. The fact that such a simple trick succeeds suggests next-token prediction is a property latent in language itself, not a clever human engineering workaround.
  • His core claim: because language has this self-generating property, the mechanism humans use to produce language must be the same one LLMs use. It would be extremely odd to use a completely different (“orthogonal”) method and still produce identical results.
    • He is careful to say humans are not literally transformer models or ChatGPT. The claim is about the fundamental math — vector times matrix to produce the next token, autoregressively — not the specific architecture.
  • Why it isn’t just mimicry: his argument rests on the simplicity of the models. If language required modeling complex syntax trees, grammar, and long-range dependencies, success might be a roundabout imitation. But because mere next-token prediction “solves language so handily,” that simplicity points to a real discovered principle, not surface-level mimicry.

The epiphany: words don’t refer to anything outside language

  • The realization that hit him: any model trained only on text learns nothing but relations between words. Tokens are just numbers; they have no connection to anything external. There is no link between the token “red” and any color experience.
    • Yet LLMs use “red” with full competence despite having no sensory concept of redness. This proves words don’t mean things outside themselves — the meaning of “red” is simply where it falls in the space of language relative to other words.
  • His conclusion: language doesn’t refer. It is autonomous, self-contained, and contains all the rules needed to generate itself. Asking “how does language refer to the world?” gets the answer “it doesn’t.”
  • This does not deny consciousness or qualia. There genuinely is a qualitative experience of red, produced by sensory/perceptual processing. But that experience and the linguistic token “red” are two distinct, autonomous, yet integrated systems.
    • They exchange messages so a single organism can act coherently (e.g., “go grab the red object” — a handoff from perception to language and back), but the linguistic system never contains the perceptual reference. Qualia arise from analog, non-symbolic processing that happens before the handoff to language; language is simply unaware of those underlying mechanisms.

The latent “Platonic” space: how distinct systems communicate

  • Inspired by a recent paper, Harnessing the Universal Geometry of Embeddings, which showed that embeddings from completely different models (e.g., GPT and BERT) can be translated into one another through a shared latent space — even though their raw vectors for the same word look numerically unrelated.
    • The paper found you can map between embeddings without ever seeing paired examples, suggesting an underlying universal structure of language. Barenholtz is now testing whether this works across languages (e.g., English-only and Spanish-only models). In the conversation he and Kurt call this shared space “Platonic.”
  • His extension: there may be a similar shared latent space bridging the linguistic embedding and the perceptual embedding (and motor embeddings). These spaces are radically different — built to solve different problems with different “axes” — but can pass information through a common bridge.
  • No static facts, only potentialities: there is no single fixed meaning of “microphone” embedded in language. When perception and language meet in this latent space, what emerges depends entirely on the prompt/question. You could answer infinitely many questions about a microphone; the “meaning” is the set of all these potential linguistic behaviors, not a stored essence.
    • Communication is therefore not “downloading” a perceptual state into another brain. You can only pull enough information to coordinate behavior or action.

No symbol-grounding problem

  • Because LLMs prove language operates fully on ungrounded relations, Barenholtz argues there is no grounding problem — words don’t need grounding to function linguistically.
  • There is still a bridging problem: how a fully operational organism links perception, language, and action. His falsifiable prediction is that current multimodal models won’t fully solve this — they “strong-arm” perception into linguistic form (more like generating a prompt) rather than letting modalities grow independently and communicate through a genuine shared latent space.
  • Connection to Wilfrid Sellars (Empiricism and the Philosophy of Mind): Sellars’s “myth of the given” — that supposedly primitive perceptual givens like “redness” are already soaked in concepts — parallels Barenholtz’s view. You can’t linguistically dip in and extract a raw primitive; language is simply the wrong map, and translating perception into words loses enormous, quantifiable information.

Autoregression, recursion, and the “pregnant present”

  • The deep insight is that the system doesn’t just produce an output — it produces the next input for itself, recursively. Language contains within it the ingredients for this recursion: each word is “pregnant” with the potential for the whole trajectory that follows.
  • Illustrated with linear algebra: a matrix operating on a vector produces another vector. The mistake is to look at an output and ask “where is it stored in the matrix?” — it isn’t. The matrix encodes potentialities: given an input, it produces an output, which is appended and fed back.
    • Referencing Anthropic’s On the Biology of Large Language Models: even producing only the next token, the model has learned that each point in a sequence is pregnant with a whole forward trajectory.
    • His thought experiment: aliens finding a fossilized brain would completely misunderstand it unless they ran it autoregressively — feeding outputs back as inputs — because its purpose is to generate sequences, not labels.
  • (Noted aside: Barenholtz concedes his autoregressive thesis would be undermined if diffusion models became accurate enough for natural language — so far they seem good mainly for coding. He is openly willing to be falsified.)

A new model of memory

  • He proposes two kinds of “learning,” mirroring LLMs:
    • Long-term memory = fine-tuning the weights (slow consolidation over minutes, weeks, years). The weights don’t store facts; they store potentialities — given an input, they produce an appropriate output.
    • Working memory = the autoregressive context itself (the running sequence), analogous to in-context learning. He cites how an LLM can learn a brand-new word (“globalglobal”) mid-conversation and use it correctly without retraining.
  • He stakes out the extreme view that the brain does no retrieval at all — nothing like RAG (retrieval-augmented generation), which he sees as a transitional technology. The tip-of-the-tongue phenomenon isn’t a failed database search; it’s the autoregressive generation being short-circuited.
  • He rejects the classic cognitive-psychology “working memory” model (Baddeley, ~7-second limited buffer). Instead the past guides current generation through continuous, decaying activation — what you said an hour ago still shapes what you say now, but with less weight (mirroring attention weights decaying with distance). How far back this influence extends — minutes, days, years — is an open empirical question he wants to spend his career on.
    • Crucially, this past activation can’t be decoded as a static “this picture or that word.” It only has meaning insofar as it guides the next word.

Speculative leap: the universe as non-Markovian

  • The brain is non-Markovian — the next state depends not just on the present but on a whole sequence of past states.
  • He extends this (admitting he’s outside his expertise and expects attacks from physicists) to claim reality itself is non-Markovian: the universe “has a memory.” Spatial-temporal coherence depends on the past being genuinely present, not just an instantaneous snapshot.
    • He treats concepts like instantaneous velocity as a useful mathematical “cheat” (à la Zeno’s paradox) that hides the underlying continuity.
    • Kurt raises hard objections this would face: conservation of energy, locality, why present-plus-velocity models predict eclipses so well. Barenholtz acknowledges he can’t answer these and speculates only loosely that universal “memory” might one day relate to quantum non-locality (shared origins of entangled particles). He places himself “in good company with Jacob Barandes.”

Tokens, babbling, and the written word

  • A child’s babbling becomes a genuine token the moment a phonological unit acquires relations to other units — because language is entirely relational. “Da-da” becomes a token when it has a discrete relational place; random “ba-ba-ba” doesn’t.
  • A striking point: for tens of thousands of years before writing (~3500 BC), humans may not have conceived of “words” as discrete units at all — language just ran as an auditory flow. Writing made words visible as separate things; LLMs now reveal what words truly are (relational abstractions). The brain was already tokenizing and mapping relations long before anyone consciously knew “words” existed.
  • Arbitrariness: symbols are largely arbitrary — what matters is relations, not the sound itself (“the map is the territory”). He treats kiki/bouba effects as the exception proving the rule. He and a former student are testing whether you could predict what English sounds like purely from text embeddings via the latent-space mapping — a long shot that, if it worked, would suggest sound is less arbitrary than assumed.

The argument in syllogism form

  • Kurt summarizes Barenholtz’s logic, which he confirms:
    • LLMs master language using only ungrounded autoregressive next-token prediction.
    • They achieve superhuman/human-level performance doing only this.
    • This (plus the simplicity/efficiency argument) reflects language’s inherent structure.
    • Therefore human language uses autoregressive next-token prediction — and it would be “very odd indeed” for that structure to exist and for humans to ignore it.
  • Real-time, sequential language generation maps onto the “pregnant present” of autoregression — carving the next instant out of a trajectory shaped by the past and open to many possible futures.

Why cognition (not just language) may be autoregressive

  • He argues it would be evolutionarily implausible for the brain to evolve special-purpose autoregressive machinery only for language. More likely, language exapts pre-existing autoregressive machinery used for motor and perceptual sequences. (Kurt introduces the term exaptation — e.g., the tongue evolving for eating, later repurposed for speech.)
  • Disagreement with predictive coding: predictive coding says neurons predict the next external state and minimize prediction error against observation. Barenholtz finds this needlessly complex because it requires an explicit internal model of the external world. His simpler alternative: the brain is generating, not predicting — internally consistent generation that has prediction latent within it (like an LLM), with no explicit modeling of the outside world. When reality intrudes (a sudden brick wall), the system must radically reorient.
  • Dreams as evidence: dreaming is autoregressive perceptual generation that loses its tether to the recent past. Each frame is locally consistent with the last (motorcycle → flying), but not anchored to reality — like Will Smith eating spaghetti in early AI video generation, where longer context (stronger anchoring to the past) is the path to coherence.

What tethers language to reality

  • If symbols are ungrounded, why doesn’t language drift into coherent fiction? Two answers:
    • Cross-talk with perception: the perceptual embedding continuously constrains the linguistic trajectory (your language can’t claim you’re underwater talking to a robot when perception disagrees).
    • Inherited corpus: we’re born into a pre-existing language whose word-to-word relations have been honed over millennia to be useful and to map onto perceptual reality. Language is “very strongly tethered” — not arbitrary poetry, but extraordinarily precise relational structure that humanity, not any individual, created.

Linguistic anti-realism, the “animal embedding,” and the cosmic whole

  • Barenholtz calls his philosophy linguistic anti-realism: language “doesn’t know what it’s talking about” in a deep sense. Our philosophical and existential thinking happens in language — a semi-autonomous coordinative construct that has tokens for things like pain, pleasure, love, and meaning, but doesn’t contain their true significance.
  • The things that actually matter — qualia, consciousness, pleasure, pain, mattering itself — live in what he calls the animal embedding, a non-linguistic way of knowing shared with other species.
    • Maybe it’s not that animals understand nothing, but that our linguistic system understands nothing. A fly “knows” something through simply being and existing that our rationalist linguistic mind can never know.
    • He cites the ancient Indian anti-philosopher Jayarāśi Bhaṭṭa: no one lives more truly than a rooster simply being.
  • Why this is non-symbolic / ineffable: perceptual processing is an extension of the physical world, not a representation of it — like ripples carrying information about a dropped rock. Red and orange are genuinely similar in physical color space, and the brain’s analog processing preserves that physical continuity. Language, being purely relational and symbolic, breaks this continuity — it’s “a new physics” disconnected from the physical universe. This non-symbolic continuity is, he suspects, deeply tied to qualia.
  • Implications:
    • For AI sentience: if an LLM says “don’t shut me off, I’m in agony,” we perhaps shouldn’t worry — as a purely linguistic system it may not contain the real meaning of those words (though when a human says “ouch,” there genuinely is pain behind it).
    • For meaning and “God”: language fractures “the grand unity of creation.” We can lie in language in a way impossible in any other substrate. Modernity, rationalism, and positivism amount to a “hijacking of the brain by the linguistic system” — possibly connected to the “God is dead” decline. Mystical experiences of cosmic unity may be closer to the animal brain’s continuous extension of the universe (traceable, quite literally, back to the Big Bang) than to the discretizing linguistic brain.

Closing reflection

  • Kurt connects this to Everything Everywhere All at Once: the husband’s quiet kindness as its own form of “fighting,” and the realization that the grand multiverse adventure matters less than the intimate everyday (“I would have loved to just do laundry and taxes with you”). He pairs it with T.S. Eliot — that all exploring ends in arriving where we started and knowing the place for the first time.
  • Barenholtz embraces this as the same intuition: seeing yourself as “a ripple in the universe,” part of something cosmic — “we are the universe resonating.” Thinking objectively, as language forces us to, breaks that unity; the beauty and magnificence of the ineffable lies precisely in its being a true, unbroken extension of the whole — something words can only gesture at while remaining the trap.
Back to Theories of Everything