TL;DR

The Chinese Room argument implies that computer programs can never really understand language. In this post, I’ll argue that LLM’s have a limited form of language understanding through teleosemantics i.e. the idea that meaning is acquired^[1] through optimisation.

The Chinese Room

The Chinese Room argument, introduced by John Searle argues against what he calls the Strong AI hypothesis i.e.

The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds.

The thought experiment poses a challenge to computational functionalism i.e. that mental phenomena can be described purely by their functional relation to each other. Since a computer program can accurately represent these functional relationships as symbols, according to functionalism, the computer program could have genuine mental states if it ran the right program.

In the thought experiment, Searle imagines himself sitting in a room receiving Chinese characters that are passed under the door. He consults a set of rules/instructions to manipulate and transform the symbols and passes Chinese characters back under the door. To an outside observer, the room produces fluent Chinese, yet Searle sitting inside the room does not actually understand Chinese.

John Searle looking confused in the Chinese Room.

The original argument was advanced in 1980 over 45 years ago, well before the advent of today’s LLM’s, but the implication for today’s systems is clear. Searle is arguing that current systems don’t use the words they produce meaningfully^[2]. In other words, when an LLM tokens the word [dog] it doesn’t actually know what it means. To it, it’s just an empty symbol.

Steelmanning the argument

The original formulation of the argument leaves it open to substantial confusion because there are two distinct observers; Searle sitting from his vantage point inside the room, and the room itself. If you asked the room a question it would respond in fluent Chinese whereas if you asked Searle a question he would respond in English from his vantage point.

This distinction between the two observers has led some to advocate the Systems Reply which asserts that the room itself understands Chinese even if Searle himself doesn’t. This essentially views Searle as a kind of computational cog (or CPU) within the system that doesn’t exhibit the necessary understanding of Chinese even the full system does. In other words, the Systems Reply separates the room from the homunculus sitting inside and suggests that the systems individual components don’t need to understand for there to be genuine understanding.

Searle’s response is that he could, in principle, just memorise all of the rules/instructions that he was following inside the room, internalising them and then leave the room wandering outdoors to converse in Chinese. In this case, he asserts that he would still have no way of attaching meaning to the formal symbols even though he would now be the entire system. There’s still no way to get semantics from pure syntax.

There are various counter-replies, but I think continued argumentation about where the system is located is unfruitful and misses the point that Searle is trying to make. His point is that one cannot simply derive semantic meaning from formal symbol manipulation alone. You cannot derive semantics from syntax and a computer program which manipulates symbols according to some syntactic rules does not automatically understand the semantics.

Formally,

P1) programs are purely formal (syntactic)

P2) minds have mental contents (semantics)

P3) syntax alone is not constitutive of or sufficient for semantics

C) programs are not constitutive of or sufficient for minds

The Chinese Room thought experiment lends support to P3), but as we’ve seen above we can quickly enter the weeds arguing about exactly where the program is located. Instead, some supporting intuition for P3 can be built independently of the Chinese Room.

Consider the fact that you can sing the chorus of Despacito without necessarily understanding any Spanish. The individual Spanish tokens which make up the song mean actual Spanish words. But if you’re singing the song successfully without understanding what the words mean then the semantic meaning of what you’re communicating is entirely different to the individual Spanish words. You might be conveying meaning by signalling to the in-group, or even just proving to your brain that you know the rhythm and sounds. But the syntax produces an entirely different semantics in your brain compared to the brain of a Spanish speaker.

The post Generalised Hangriness introduces the idea that the words we say don’t necessarily convey all the meaning explicitly indicated by the words themselves. For example, if I respond in an uncharacteristically angry or unkind way, the literal meaning of my words might be different to the meaning of the subtext they’re communicating. I might shout “stop being so annoying!” when I really mean something like “I’m tired and hungry and just need to sleep.”

Also consider a well-trained parrot who says “Hello! Polly want a cracker!” It’s obvious the parrot doesn’t understand the actual words it’s saying in their English-language context even though it’s articulating the words.

Some notes on the other premises

If you subscribe to something like Dennett’s Intentional Stance you’ll probably just reject P2). There’s no magic thing called “semantic understanding” which sits on top of flexible communication. No additional explanatory power is obtained by insisting that the system doesn’t truly understand if it passes all our behavioural tests of understanding such as a Turing test. If it walks like a duck and talks like a duck then we may as well call it a duck.

For what it’s worth, I think Dennett could be correct on this point. But I also think this argument is sometimes used to kill the debate when there are still interesting points to examine. Namely, is P1 correct? Could an LLM or another AI system actually be said to understand things in the sense that Searle meant it? Or are they doing the equivalent of singing Despacito without understanding Spanish?

To tackle these questions we’ll introduce the concepts of intentionality and teleosemantics.

Intentionality

Thoughts can be about things. When you think of an object such as a dog or an electron you create a representation about the objects in question. This ability for mental states to be directed towards some object or state is called intentionality in philosophy of mind and poses a number of interesting problems.

Human minds clearly have intentionality. When you think of a dog it’s about dogs. This mental state is directed towards the concept DOG^[3] which relates to the hairy, barking animals inhabiting the real-life world we live in. We don’t just token an empty symbol [dog] without it actually referring to the concept DOG unless maybe we’re not concentrating properly, are not a native English speaker or have never encountered the token [dog] before and don’t know what it means.

LLM’s are certainly capable of tokening the word [dog] but what, if anything, is this token actually directed to? Is it just syntax? Or do LLM’s actually represent what humans mean by dogs when they see the token [dog]?

Teleosemantics

Teleosemantics ties content to proper function: a state represents what it was historically selected for to carry information to a downstream consumer system.

To use an often cited example, consider a frog snapping its tongue at a fly which passes across its visual field. The frog produces a representation FLY which is then consumed by the motor system in the form of a tongue-snap. The content lies in-between the producer (visual field) and consumer (motor system) mechanisms. The producer represents FLY because this historically led to the consumer doing the right thing in the form of a tongue-snap. In this case, the representation FLY takes on the meaning nutritious meal for a frog where it would take on a different meaning (say annoying-buzzy-thing) for a human whose consumer mechanism is a hand swatting it out of the way.

In this way, teleosemantics provides a nice story for how content is ascribed to certain representations. Content is simply that which the consumer reliably used the representation for in the causal history. It also provides a way of describing misrepresentations. If the frog snaps its tongue at a black blob on its visual field and this turns out to be a dust particle it has represented the dust particle incorrectly as a fly.

Teleosemantics as optimisation

As Abram Demski notes in his post on teleosemantics:

Teleosemantics reduces meaning-making to optimization. Aboutness becomes a type of purpose a thing can have.

In this way, we can use purposeful language to describe objects; hearts are for pumping blood, lungs are for breathing. Maps are for navigating the territory.

But the key insight is that evolution by natural selection is not the only way to select proper functions. Reinforcement Learning creates an analogous optimisation via selection process in artificial systems. When biological systems interact with the environment, reward mechanisms fire in scenarios likely to increase a system's chances for survival. The same structure occurs when reinforcement learning agents receive a reward which causes them to develop certain functions which maximise the reward signal. A reinforcement learning agent which is exposed to reward signals therefore experiences semantic grounding according to teleosemantics in the same way that a biological system does. In both cases, representations that successfully track environmental features are reinforced. The substrate and fine-grained implementation details are different but the selective dynamics are functionally identical.

This solves several puzzles neatly. Namely, it provides a theory of semantics and intentionality within an entirely naturalistic framework explaining that there is a normative distinction between a function performing the role it is supposed to perform rather than malfunctioning. It also gives us a ready made reply to the Chinese Room argument. In the Chinese Room, the system never develops genuine intentionality because there is no evolutionary history which builds up the functions. It just follows the rules.

The Robot Reply

The Robot reply concedes that Searle is right about the mental state of the operator in the room. They wouldn’t understand the Chinese symbols they’re seeing from inside. However, if we consider a modification and put the Chinese Room in a robot body with sensors such as video cameras, microphones etc… it would roam the world freely and begin to attach meaning to the symbols it encounters much as a child learns language by interacting with the world. This reply sits well with the teleosemantic view we’ve sketched above. If a robot system begins to optimise for movement in the world it will build a world-model which helps it to navigate and allows it to attach meaning to the things it encounters. In other words, it starts to optimise its map to align with the territory.

Searle’s response to this is that the world-directed content is just another form of syntax which would be impenetrable to the operator inside the Chinese room. For example, imagine that the images and sounds the robot is encountering outside the room are digitised and fed into the room in binary format as 1’s and 0’s. Searle from his viewpoint within the room would still be unable to understand the stream of content but would retain the ability to manipulate the symbols and thereby cause the robot to interact with the world.

The key point is that teleosemantics has an answer here. The embodied robot system develops genuine understanding through its optimisation process. The producer and consumer mechanisms work together to create proper functions which constitute semantic concepts. By contrast, Searle from within the room is not following an optimisation process - the rule book has been given to him and he’s implementing the rules. But there’s no feedback loop with the environment. No “reward” to tell him if he’s implementing the rules correctly or incorrectly and no genuine understanding of what the symbols mean.

What does this all mean for LLM’s?

This brings us back to the question of LLM’s. During training LLM’s undergo an optimisation process to learn to predict next tokens accurately and to output responses favourable to human judges via RLHF. Is the fact that they undergo optimisation enough to ground genuine understanding via teleosemantics?

The answer is nuanced. Unlike frogs or other animals who evolved visual systems to track flies and other “real-world” objects, the LLM’s entire input/output stream is text. When an LLM encounters the token [dog] its consumer mechanism is not a motor system that needs to interact with actual dogs in the real world. This implies that whatever semantic meaning it derives from the token [dog] it cannot fundamentally refer to the same concept of “fluffy, barking animals” that humans refer to when they use the token [dog]. Their “mental states” would be entirely different to humans.

However, the LLM still undergoes an optimisation process which means according to teleosemantics it still derives some semantic meaning from the token, just maybe not the meaning we might naively expect it to have as humans. The massive human text corpus that LLM’s are trained on encodes facts about dogs, such as the fact that they bark, are fluffy and wag their tails etc… the LLM builds internal representations via its optimisation process that reliably predict tokens in the sequence corresponding to these facts. Its proper functions therefore reliably track linguistic phenomena which themselves track world-directed content second-hand.

This optimisation process builds up a “web of concepts” whereby LLM’s have access to relationships between many different tokens embedded in a high-dimensional latent vector space. Importantly, this web of understanding doesn’t perfectly track the real world. Sure, it’s been built up by optimising for next-token prediction and human preferences which track the world in somewhat high fidelity, but this isn’t perfect as the LLM never interacts with the real world itself. As a result, the understanding it builds will be a spectre of the genuine human-like understanding that we build up via interaction with the real world.

This also explains many interesting nuances of LLM behaviour. For example, sycophancy can be attributed to the fact that LLM’s have been optimised via RLHF based on human judges. Their view of “correct” answers is entirely based on what humans like to hear so, according to teleosemantics they should output somewhat sycophantic answers. Another example is hallucinations. On teleosemantics hallucinations are misrepresentations just as a frog mistakenly snaps its tongue at a BB pellet instead of a fly so too the LLM can output hallucinations as a result of its optimisation.

Conclusion

Taking a step back, we’ve reviewed three systems exhibiting different levels of understanding according to teleosemantics.

Searle within the Chinese room - exhibits no understanding. There is no optimisation process and no world-directed content.
Embodied robot system - exhibits full understanding once it has interacted sufficiently with the world to build an accurate world model via teleosemantics.
LLM - exhibits linguistic understanding based on interacting with human text corpora via optimisation. Understanding is fundamentally different to human understanding since the optimisation function is different and it involves pure text and no direct world interactions.

In some sense, even in spite of modern LLM’s displaying such a sophisticated understanding, the thrust of Searle’s argument still rings true and can provide a useful lesson. Searle primarily argued against “rule-based” encoding of syntax in AI programs and, I think correctly, anticipated that these hard-coded rules would never develop the genuine understanding that today’s LLM’s have achieved. Indeed, Searle never ruled out that AI would one day have genuine intentionality saying that such an AI would need to recreate all the “causal powers” of the brain.

It’s not clear exactly what he meant by “causal powers” but I think teleosemantics provides a clean framework to distinguish between the relevant cases.

^{^}
Strictly speaking, teleosemantics is a theory of what constitutes semantic content, but I’m using it in this essay to talk about how states acquire semantic content. My model is that teleosemantic optimisation is how systems typically acquire semantic content but, once acquired, the constitution of mental content is played by the causal/functional roles of the concepts themselves.
This avoids Swampman type objections to teleosemantics (i.e. a molecule for molecule copy of a human could still have mental content) but opens up other complexities which I plan to expand on in a future post.
^{^}
The argument has historically been linked to debates about machine consciousness, but for now I'll bracket these discussions and just focus on meaning and understanding.
^{^}
Regarding notation: capitalised letters indicate the concept (semantics) whereas brackets indicate the token (syntax). Here DOG refers to the semantic content and [dog] refers to the syntax.

LESSWRONG
LW