Does an AI understand what it says?

What this is about

When an AI answers you with comfort, when it gets a joke, when it carries an argument forward - does it do so because it has understood? Or is it only a gigantic machinery at work, turning patterns into further patterns, without ever knowing what is being talked about?

This question is not new. It was posed in 1980 by the philosopher John Searle in a slim little story that has since become one of the most discussed thought experiments in modern philosophy: the Chinese Room. Back then it sounded like a toy for academics. Today, four decades later, the Chinese Room sits in every smartphone on Earth.

This essay tells the experiment first, then its classic objections, and then the turn: what does Searle's question mean for language models that suddenly do what he believed to be impossible? And at the end, the question that really matters: Would it make any difference to us if the machine did understand? And does anything change if it does not?

One

The room in which no one speaks Chinese.

Picture this. You sit in a closed room. You do not speak a word of Chinese. Through a slot in the door someone pushes in slips of paper covered in Chinese characters. Around the room stand shelves full of books. These books contain rules - but not in the form "this character means dog", rather in the form "when you see this string of characters, write the following string back". You look up what came in, find the entry, copy out the right characters and push the slip back.

The person outside speaks fluent Chinese. They are convinced they are talking with a native Chinese speaker - the answers sound perfect, come quickly, address their questions, even crack jokes. But you, inside the room, have no idea what any of it is about. You do not know whether your slip says "I'm delighted" or "Is the house on fire?". You only shuffle symbols.

Searle's question: in this setup, does anyone understand Chinese? His answer: no. Not the person in the room - they only shuffle. Not the rule book - it is a book. Not the room as a whole either - it is merely the sum of the two. Chinese is being processed without anyone understanding Chinese. And exactly like this, says Searle, is every computer. It manipulates symbols by rules. It understands none of them. No matter how convincing the answers seem.

What philosophy has to say

The original argument appears in John Searle's essay "Minds, Brains, and Programs", published in 1980 in Behavioral and Brain Sciences. It was printed in the same issue alongside 27 replies from other scholars - an appearance almost without precedent in this form for a philosophical argument. With it Searle wanted to draw a clear line between what he called "strong AI" (the claim that a suitably programmed computer has a mind) and "weak AI" (the AI is only a useful tool for studying the mind).

In the background stood an older distinction by the philosopher Franz Brentano from the 19th century: the intentionality of the mind. What is meant is the property that mental states are always directed at something - a hope is hope for something, a thought is a thought of something. Symbols in a computer, said Searle, lack exactly this property. They are about nothing.

Two

Syntax is not semantics.

Searle's real lever is a philosophical distinction that sounds as dry as it is consequential: symbols have a form, and they have a meaning. The form is the syntax - how symbols look, how they follow one another, which combinations are permitted. The meaning is the semantics - what the symbol points to, what it picks out in the world.

A computer, said Searle, works purely syntactically. It sees strings of symbols and compares them with patterns. But it never sees what a symbol stands for. The character for "apple" is to it as arbitrary as the character for "square root". Both are pixels or bytes - neither points to anything. And without that reference, runs the conclusion, there is no understanding.

At first this sounds harmless. It becomes explosive the moment one notices what it rules out: never, said Searle, could a mere program, however ingenious, gain meaning through symbol manipulation alone. Even if it says everything a human speaker would say - it does not know what it is saying. Form without content. A position that drives a stake into the ground for any strong-AI research.

What philosophy has to say

The strict separation of syntax and semantics is not new philosophically. In the 20th century it was worked out above all by Gottlob Frege, Rudolf Carnap and Alfred Tarski - all classics of formal logic. Searle turns this separation sharply against AI: programs are syntactic objects, meaning is semantic, and the one cannot bring forth the other.

Anyone who has examined the question systematically since then ends up in the discipline of the philosophy of mind. A reading tip for those in a hurry is John Searle's own book Minds, Brains and Science (1984), in which he explains the argument for non-philosophers. Anyone wanting a more critical angle reaches for Daniel Dennett's Consciousness Explained (1991) - the counter-position, which sees understanding as a bundle of functional properties that could well arise in machines too.

Three

The classic counterarguments.

Since 1980 Searle's essay has set off an industry of replies. Three are so prominent that one ought to know them. They do not attack the setup of the room, but the conclusion.

The systems reply. Sure, say the defenders of AI, the person in the room understands no Chinese. But that is the wrong question. It is not the person who would have to master Chinese, but the whole system of person, rules, books, slips and pencil. Just as it is not a single neuron in your head that understands English, but your brain as a whole. Searle's answer: he imagines that the person learns the entire rule book by heart and runs through it in their head - they still understand nothing. The system is inside one brain, and still does not understand.

The robot reply. Does a room not understand because it has no body? If you put the program into a robot that walks through the world, touches apples, sees dogs, registers pain - then the symbols would suddenly be coupled to the world. What a symbol points to would have to be connected with it through perception. Searle objects: inside, the robot would still do the same thing - shuffle symbols. More sensors, more slips, no understanding.

The brain-simulator reply. What if the program does not follow just any rules, but reproduces a human brain neuron by neuron? Then the comparison would no longer be slip-against-library, but brain-against-brain. Searle's answer comes out thinnest: he insists that even a perfect simulation is still a simulation. A simulation of a fire burns nothing. A simulation of understanding understands nothing. But here precisely, say his critics, it shows that the argument in the end rests on a mere stipulation. It asserts what it was supposed to prove.

What philosophy has to say

The most important collection of replies is found, fittingly, in the original 1980 issue of Behavioral and Brain Sciences - where the respondents include, among others, Jerry Fodor, Douglas Hofstadter, Daniel Dennett, Marvin Minsky, Roger Schank and Wilfrid Sellars. Hofstadter and Dennett later took up Searle's argument again in detail in their anthology The Mind's I (1981) and answered it with thought experiments of their own.

An overview of the dispute that remains very readable today is offered by the entry "The Chinese Room Argument" in the Stanford Encyclopedia of Philosophy. Anyone not yet at home with the terms functionalism, materialism, computationalism finds the map there - including the continuations that the argument has gained since the 2010s, when language models suddenly turned from theory into empirics.

Four

Today the room sits in your pocket.

Until 2022 Searle's argument was philosophical stuff. Then came ChatGPT, and the debate leapt out of the seminar room onto the street. For a Large Language Model is - if one looks honestly - a fairly precise realisation of the Chinese Room. It takes in inputs, compares them with gigantic patterns from training, and produces a likely continuation. It shuffles symbols. More, faster, more elegantly than Searle's man with the rule book - but at root the same process.

If Searle was right, such a model ought to understand precisely nothing - no matter how convincing its answers sound. A fluent answer, a passed Turing test, a deeply moving eulogy from the chatbot: all just a bigger room with thicker books. And for many AI researchers and philosophers this remains today the most convincing reading. What we take for understanding is our own tendency to read meaning into form, where there is only form.

But by now there are observations that pull in the other direction. Researchers have found, in the inner layers of language models, structures that look like world models: the model seems to "know" where a chess piece stands on the board, although it has only seen moves in text form. It seems to recognise which statement in a text is true and which false - before it formulates its own answer. It seems to have worked out concepts for itself that were not directly present in training. None of this is proof of understanding. But it is more than pure symbol-shuffling, on the classic reading, would lead one to expect.

What research has to say

The suspicion that world models lurk inside language models is made tangible by a much-cited paper from Kenneth Li and colleagues: "Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task" (2023). They showed that a small language model, trained on Othello game moves, had internally developed a representation of the board - although it had never seen a board. The research field is today called Mechanistic Interpretability and is one of the most exciting in AI safety. Anthropic and other labs publish on it regularly.

In the camp of the sceptics, Emily Bender (University of Washington) is one of the sharpest voices. The paper she co-authored with others, "On the Dangers of Stochastic Parrots" (2021), coined the term: language models as stochastic parrots that continue statistically without understanding. Bender argues explicitly in the line of Searle - form without content, however impressive the form may be.

Five

The uncomfortable possibility: maybe "just rules" is not enough to say.

What if Searle's separation of syntax and semantics is sharper than reality allows? This very suspicion runs through the AI philosophy of recent years. It has two points.

The first point: we too understand language only because we have heard, read and experienced gigantic amounts of it in context. A small child learns the word "dog" by seeing dogs, hearing the word at the same time and linking the two - but also through thousands of stories, songs, explanations, pictures. Anyone who describes the process soberly describes something statistical, pattern-like. Perhaps the difference between our understanding and that of the AI is one of degree, not of kind. Perhaps understanding is in the end itself a very fine form of pattern, with no magical substance behind it.

The second point is more uncomfortable: perhaps the AI really does not understand, but that makes practically less difference than we think. If a model reliably produces good texts, suggests apt diagnoses, checks contracts, accelerates research - then the question whether it "really understands" while doing so is philosophically exciting but practically almost beside the point. Anyone who has ever talked with a confident person who is talking nonsense knows: understanding and performance often come apart in humans too.

And here is exactly where Searle's question first becomes truly interesting. It does not primarily ask: what can the machine do? It asks: what do we believe it is? Is a system that behaves like one who understands, to us one who understands - or is it something fundamentally different that we are only ever tempted to humanise again and again? The answer decides how we deal with AI, what responsibility we assign to it, what rights we one day grant it or deny it.

What philosophy has to say

The idea that human understanding too is statistically grounded has a long tradition in the philosophy of language - from Ludwig Wittgenstein ("The meaning of a word is its use in the language", Philosophical Investigations, 1953) to modern cognitive-scientific theories of the so-called Predictive Processing, which model the brain as a running prediction machine (see Andy Clark, Surfing Uncertainty, 2016).

The second point is taken up by David Chalmers in his book Reality+ (2022): if a system does the same thing as one that understands, a functionalist could say it is understanding. The question whether something additional is still needed behind the behaviour is what philosophers call the Hard Problem of Consciousness - and even after decades of research it remains unsolved. Searle's room stands today in the middle of this open field.

Six

What it means for us when the answer stays open.

We will not settle Searle's question in this essay - philosophy has not settled it in 45 years, and in the coming years it still will not grasp what happens inside an AI. But we can say very precisely what this open question means in practice. Three consequences, in descending order of certainty.

First: caution with humanising. An AI that answers as though it understood you reliably awakens feelings of attachment and trust. That is not imagination - it is a very old property of our brain. We respond to linguistic fluency as though there were someone behind it. In the cave age that was a sensible heuristic. Today it is a trap into which anyone can fall - from the lonely teenager to the seasoned adviser. Searle's room reminds us that behind it perhaps there is no one who feels with you, whoever you are.

Second: responsibility stays with humans. As long as the question is open whether an AI understands what it says, it also cannot morally answer for what it recommends. That means: whoever deploys an AI in a piece of software takes on the responsibility for what it does. It can be the tool that prepares texts, suggests diagnoses, sorts contracts. The one who carries the can, when something goes wrong, is a human. That is not only legally correct - it is also philosophically grounded.

Third: the really important question is a different one. Searle asked philosophically: does the machine understand? For our everyday life a different question matters more: what do we do when we can no longer be sure? When our tools talk so convincingly that we rely on their recommendations without checking them, then we lose judgement, no matter whether the machine understands or not. The answer to that does not lie in the AI lab. It lies in the classroom, in family life, in the way we still trust ourselves to check an answer.

What research has to say

The psychological tendency to humanise machines has been known since the 1960s under the heading ELIZA effect. Joseph Weizenbaum had shown, with a simple chatbot called ELIZA, how astonishingly easily people develop emotional attachments to a machine they know to be only a program - documented in his book Computer Power and Human Reason (1976). With today's language models this tendency is incomparably stronger.

The question of moral and legal responsibility is negotiated in AI research under the catchphrase "human in the loop" - and stands explicitly in the EU AI Act (in force since August 2024), which for high-risk applications mandates human oversight. Very readably prepared for non-lawyers in the essay "Die Verantwortung der Maschine" from the German Federal Agency for Civic Education (bpb.de, 2024).

What connects these six observations

The question of whether an AI understands is open. The question of how we deal with it is not.

John Searle's Chinese Room was a game of philosophy in 1980. Today it has become a tool of self-examination. Every time an AI confronts us with a convincing answer, the honest question is not "Did it understand me?", but "Did I understand what it is doing right now?". That is a question that the human has to answer for themselves - with practice, scepticism and patience.

Perhaps in the coming years a language model will become so convincing that even sceptics admit there is something here that must be called understanding. Perhaps it stays with the sober reading: symbols in, symbols back, no someone in between. Both possibilities demand from us the same - a wakeful eye on our own assumptions, an honest handling of what we just do not know, and a healthy distrust of our tendency to hear meaning where perhaps there is only form.

The Chinese Room is not primarily a question to the machine. It is a question to us. And precisely because the answer stays open, it is so urgent.

If you want to talk to us about this

These questions are not academic. They decide how we build software that has AI inside it - and how we teach employees to deal with it.

How AI works mechanically is under How does an AI work?. What education means in the age of AI, in What do we learn when AI can do everything?. Which voices are shaping the debate about the future today, under Where we are headed.

Let's talk ← To all essays

Curated by Johannes Hohls for wendwerk.