An LLM has read more than any person ever will. The training corpus of a modern frontier model is roughly the entire internet plus an ambitious library. When it answers a question, it's drawing on more material than any human mind could hold at once.
Something is still missing. Everyone who has used these models long enough knows this. The model will explain grief fluently without having felt grief. It will describe cold water without having felt cold water. It will give advice about raising a child without ever having raised one.
What the corpus leaves out
Every LLM is built on text. For multimodal models, text plus images, audio, and video. That is a lot of signal. It is also a very specific kind of signal: the part of human experience that got translated into a form another mind could consume, usually because someone sat down and wrote it.
Most of being a person never got written down.
Michael Polanyi had a line for this in 1958: "we know more than we can tell." A huge amount of human competence is tacit. The knack of a craftsperson. A clinician's gut feeling. How a dancer knows where her weight is. You cannot write down how to balance a bicycle well enough to teach it. The knowledge lives in muscles and inner ear.
An LLM has neither. It has the written record of people who learned, then tried to describe the result.
Why scaling won't fix this
The obvious objection: more data, more modalities, more scale, and eventually the gap closes.
I don't think so. The gap is structural.
The thing we're trying to capture, lived and felt experience, isn't just underrepresented in the training distribution. It is, by definition, the part that didn't make it out. The feeling of grief is what's left after you failed to say it. The writing about grief is a gesture in its direction. Train a model on gestures and you get a model fluent in gestures. You don't get grief.
Hubert Dreyfus argued something close to this for forty years. He was dismissed, then deep learning made him look prescient, then he was absorbed into the new orthodoxy and mostly forgotten. The embodiment gap he pointed at is still there. It just moved further underneath.
What LLMs do have
"The model lacks X" slides easily into "the model is empty." That isn't true.
LLMs have the statistical shape of an enormous amount of human reasoning. They have the rhythms by which ideas connect. They've absorbed how humans explain things to each other. A system that can reliably reproduce the shape of human thought is a tool of civilizational significance, even without inner experience.
Some embodied knowledge does leak into language. You can learn the structure of grief from enough writing about grief. You can learn useful things about bicycle balance from instructional text. Language is lossy compression, but it isn't zero compression.
The mistake goes in either direction. Treating LLMs as empty is wrong. Treating them as equivalent to someone who has lived the thing is also wrong.
Where the limit actually bites
In domains that live in text (formal reasoning, coding, law, administrative work), the limit barely shows. The relevant knowledge really is in the corpus.
In domains where knowledge lives in bodies, the limit is sharp:
- Medicine. The "this patient doesn't look right" diagnosis experienced clinicians make but cannot articulate.
- Therapy and care. A model can produce therapeutic-sounding language with no therapeutic presence. When it matters, the mismatch matters enormously.
- Physical craft. Any domain where a sense of material, weight, tension, and give is load-bearing.
- Teaching young children. Mostly coordination with a small person's body and attention, not transmission of facts.
- Moral judgment in concrete situations. Aristotle called this phronesis. No formal ethics has ever replaced it.
In these domains, LLMs help. Sometimes a lot. They do not substitute. A lot of mistakes in current AI deployment come from assuming they do.
The point
I'm not saying LLMs are a dead end. I use them constantly, including to write this.
The point is that we've built a new kind of mind, and it isn't the same kind we have. Its knowledge comes through the narrow channel of symbols people managed to write down. That channel is wide enough for astonishing work. It is not wide enough for the whole of being alive.
If we forget this, we'll keep asking these systems to do things they literally can't. Worse, we'll redefine those things so the systems can do them. Text-attunement will become "empathy." Correct advice will become "wisdom." Fluent response will become "presence." Each redefinition is a small win for the technology and a small erosion of what it was standing in for.
There's a kind of knowing that can't be written down, that has never been written down, and that the most remarkable product of the written-down kind still doesn't contain.
It still lives only in us. For now.