When you talk to an LLM (Large Language Model) for long enough, a particular impression starts to take hold: this machine might actually be thinking. The sentences are natural, the context is read well, and sometimes the answers are sharper than what a person would have given. Predictions like the one Dario Amodei sketched in ‘Machines of Loving Grace’ — that an AI smarter than a Nobel laureate could arrive as early as 2026 — no longer feel out of place.
I want to take that impression seriously and check whether it actually holds. Is being fluent with language the same thing as thinking? The recent answer from cognitive science and neuroscience is unexpectedly firm: language and thought do not run on the same circuitry. In this post I want to walk through that evidence, ask what we are actually seeing in LLMs if it isn’t thought, and locate the territory that real intelligence still has to cover.
Is Language the Engine of Thought, or a Tool for Communication?
We narrate ourselves in our native language all day long. So we naturally tend to collapse language and thought into a single thing. This intuition isn’t only an everyday rule of thumb — for a long stretch, it was the academic default as well. Wittgenstein wrote in the Tractatus that “the limits of my language mean the limits of my world.” The strong form of the Sapir–Whorf hypothesis argued that the structure of one’s native language determines the structure of thought itself. Noam Chomsky went further, treating language not as a means of communication but as the central organ that makes thought possible. For decades, “language as the engine of thought” was close to the field’s default position.
But the neuroimaging work of the past twenty years pushes directly against that hypothesis.
Evidence for Thought Independent of Language
The clearest single synthesis of this shift is Evelina Fedorenko’s group at MIT, in “Language is primarily a tool for communication rather than thought” (Nature, 2024). Drawing on dozens of neuroimaging studies, the paper centers on two circuits in the brain that operate as almost entirely separate regions. One is the “language network,” which handles grammar and word meaning. The other is the “multiple demand network,” which handles logical reasoning, mathematics, and complex problem solving.
The conclusion the team draws on top of this synthesis is simple but strong: language is not the engine that produces thought — it is a tool for transmitting thought that has already been produced. The piece that carries this conclusion directly into the AI debate is Benjamin Riley’s “A Large Language Mistake” in The Verge. The title is the argument: the current AI discourse that treats a Large Language Model as if it were intelligence is, Riley says, a large language mistake. The source of that mistake, he argues, is exactly the neuroscientific dissociation above — language and thought do not take place on the same circuit, so fluency in one does not guarantee competence in the other.
Many other studies and observations also support Fedorenko’s team’s findings. Representative examples include:
First, infants reason about the world before they can speak. The developmental psychologist Alison Gopnik, in “Scientific thinking in young children” (Science, 2012), surveys a body of evidence showing that preverbal infants already perform probabilistic inference, run small experiments to test causal relationships, and build intuitive theories about how the world works. That thought precedes speech is, in a way, the most ordinary and yet most forceful evidence of the split between language and cognition.
Second, patients with completely broken grammar can still solve algebra. Varley et al., “Agrammatic but numerate” (PNAS, 2005), reports that patients with severe aphasia — whose grammatical processing centers are permanently damaged, so they cannot understand a sentence like “the boy chased the girl” — can still solve equations.
Third, people read minds without speaking. Varley and Siegal (Current Biology, 2000) show that an agrammatic aphasic patient who could barely communicate still passed theory-of-mind tasks involving other people’s intentions and false beliefs.
Fourth, the brain regions that handle musical structure are separate from the language network too. Fedorenko, Behr, and Kanwisher, “Functional specificity for high-level linguistic processing in the human brain” (PNAS, 2011), finds that the brain regions analyzing harmony and meter barely overlap with the language network — and the same paper shows the language network is also dissociated from arithmetic, working memory, and cognitive control.
Embedding Space: The Tracks of Thought We Left in Language
We have seen above that language and thought take place in different regions of the brain. So how does an LLM — which only imitates human linguistic ability — manage to appear as though it is thinking and reasoning like a human? The key lies in the language LLMs were trained on. The vast text LLMs learned from is the accumulated record of human thought across millennia. That compressed thought leaves traces in the relations between words, and an LLM unrolls those relations as a map across a vector space of thousands of dimensions. Mikolov et al.’s 2013 Word2Vec paper first made this visible with the famous arithmetic of embeddings.
The machine didn’t understand the social essence of kings and queens to arrive at this answer. But because humans engraved conceptual axes like [gender] and [power] into language, those axes are projected onto the vector map with enough precision that simple arithmetic looks like reasoning.
One reading of this calls it a stochastic parrot and mirror sliding along a geometric trajectory. The opposite reading says the model has, in the act of compressing enough human language, partially reconstructed inside itself the skeletal logic that organizes the world. It is too early to say definitively which reading is right. If the image in the mirror is precise enough, the mirror is already carrying some of the substance of what it reflects.
The Intelligence Beyond the Mirror: Problems the Next Wave Must Solve
If LLMs cannot realize all forms of intelligence, does that make them a failure? Quite the opposite — their limitations sharply outline the challenges that AI still needs to overcome.
First, physical world models — a concept Yann LeCun proposes in ‘A Path Towards Autonomous Machine Intelligence’. The idea is that a machine should model how the external world works internally, and be able to predict the consequences of actions before taking them. Just as we can simulate what would happen before reaching toward a flame, the goal is to internalize the causal relationship between action and outcome. This kind of knowledge cannot be built from text data alone — it has to be learned by colliding directly with a three-dimensional world.
Second, causal reasoning. Past the statistical correlation “A is often followed by B,” to the question “what conditions need to be true for B to be true?” Today’s LLMs are good with correlation and weak with causation.
Third, metacognition. A self-reflective circuit that can distinguish what it knows from what it doesn’t, and prune hallucinations through self-checking. Safety researchers at organizations like the Center for AI Safety have been pointing to this gap for some time.
These three sit, notably, outside the language network — in the same “other circuits” the neuroscience kept pointing to. That the weaknesses of LLMs cluster exactly there is not, I think, a coincidence.
Closing: Beyond the Mirror
Whether today’s LLMs lead to AGI, no one yet knows. But the machine that has most clearly shown what that journey still requires is the LLM itself.