Most people are familiar with John Searle's Chinese Room argument ("Minds, Brains, and Programs" (1980)):

Suppose that I'm locked in a room and given a large batch of Chinese writing. Suppose furthermore (as is indeed the case) that I know no Chinese, either written or spoken, and that I'm not even confident that I could recognize Chinese writing as Chinese writing distinct from, say, Japanese writing or meaningless squiggles. To me, Chinese writing is just so many meaningless squiggles.

Now suppose further that after this first batch of Chinese writing I am given a second batch of Chinese script together with a set of rules for correlating the second batch with the first batch. The rules are in English, and I understand these rules as well as any other native speaker of English. They enable me to correlate one set of formal symbols with another set of formal symbols, and all that 'formal' means here is that I can identify the symbols entirely by their shapes. Now suppose also that I am given a third batch of Chinese symbols together with some instructions, again in English, that enable me to correlate elements of this third batch with the first two batches, and these rules instruct me how to give back certain Chinese symbols with certain sorts of shapes in response to certain sorts of shapes given me in the third batch. Unknown to me, the people who are giving me all of these symbols call the first batch "a script," they call the second batch a "story. ' and they call the third batch "questions." Furthermore, they call the symbols I give them back in response to the third batch "answers to the questions." and the set of rules in English that they gave me, they call "the program."

Now just to complicate the story a little, imagine that these people also give me stories in English, which I understand, and they then ask me questions in English about these stories, and I give them back answers in English. Suppose also that after a while I get so good at following the instructions for manipulating the Chinese symbols and the programmers get so good at writing the programs that from the external point of view that is, from the point of view of somebody outside the room in which I am locked -- my answers to the questions are absolutely indistinguishable from those of native Chinese speakers. Nobody just looking at my answers can tell that I don't speak a word of Chinese.

Let us also suppose that my answers to the English questions are, as they no doubt would be, indistinguishable from those of other native English speakers, for the simple reason that I am a native English speaker. From the external point of view -- from the point of view of someone reading my "answers" -- the answers to the Chinese questions and the English questions are equally good. But in the Chinese case, unlike the English case, I produce the answers by manipulating uninterpreted formal symbols. As far as the Chinese is concerned, I simply behave like a computer; I perform computational operations on formally specified elements. For the purposes of the Chinese, I am simply an instantiation of the computer program.

Now the claims made by strong AI are that the programmed computer understands the stories and that the program in some sense explains human understanding. 

Many people are also familiar with Bender, Gebru, McMillan-Major, & Shmitchell's 
"On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" (2021):

Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.

Many other people have of course argued that this criticism of Large Language Models (LLMs) is unfair and mischaracterizes them. See for example The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs or Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task. [The word 'haphazardly' in the sentence quoted above also seems particularly unjustified — one rather gets the sense the authors had an axe to grind, an impression which the rest of the paper sadly does little to dismiss.]

However, one could very reasonably characterize a single layer of an LLM as "just a stochastic parrot": its behavior is moderately complex but is actually understandable and even mathematically modelable, and it is clearly just combining probabilistic information in a way learnt from the vast training data, with no obvious slots for meaning (though it is not doing so haphazardly).[1] Then Bender et al's only unjustified assumption would become that anything constructed only from (a stack of) stochastic parrots must itself just behave as a stochastic parrot. This assumption seems questionable in light of the human mind, which we believe is constructed mostly of the behavior of neurons, plus a few other simple components, which sure seem like they're individually just stochastic parrots, or even rather less than parrots — though their organization is more complex than a simple stack. However, the pattern of weights in a stack of sufficiently large neural net layers can contain any desired internal structure, apart from a lack of long loops (it could contain short unrolled loops), during the processing of a single token. The phenomenon of Turing completeness, combined with the fact that multi-layer neural nets with at least one hidden layer are universal function approximators together strongly suggest this assumption is unjustified.

So the somewhat remarkable discovery of modern LLMs is that if you just stack the right sort of stochastic parrots about 30–100 deep (with residual connections), have them all play Telephone,[2] and then train them on a sufficiently vast amount of human text in various languages (including Chinese), roughly 20 tokens' worth per parameter in the network, then you can in fact train a fairly effective Chinese Room. (You don't even need to speak any Chinese yourself to do so, as long as you can handle its character set and have a well-filtered training set.) So the (apparent) ability to understand meaning, as an emergent phenomenon of a very complex non-linear system, gets trained into the LLM by the task of attempting to predict a vast amount humans' text reflective of humans' (apparent) ability to understand meaning. In machine learning terminology, we're knowledge distilling[3] this capability to (apparently) understand meaning from humans into our AI model. It seems that we don't need to run any very long loops for each individual token: comparing biological neuron signal propagation delays to the speed of normal speech makes this unsurprising. (Current LLMs are notable bad at tasks like long division of large big-endian numbers that provably do require this.) Note that this capability of (apparently) understanding meaning is one that we don't yet know how to produce using symbolic AI: despite numerous attempts, nothing even close in capability to the program for Searle's Chinese Room has so far been produced by symbolic AI approaches to Natural Language Processing (and indeed these have been mostly abandoned).

How this apparent philosophical conundrum might be resolved is somewhat illuminated by many recent interpretability results from LLMs, which demonstrate that for the layers near the middle of them, the "probabilistic information" being worked with (activations, attention heads' attention, and forth) is semantic, expressing aspects and relationships of related meanings and concepts, rather than being symbolic, and that apparently almost any human language can in fact be translated into much the same set/ontology/embeddings of meanings and concepts (as one would have expected from the fact that translation is normally possible). So it seems that there are in fact a great many slots for meaning in the LLM after all: the middle layers of it appear to be full of them. Presumably the human brain is also learning to do something rather similar internally, when we learn to speak/read/write a human language: acquiring the ability to convert symbols to their meanings, manipulate and combine those, and then convert them back into symbols, using its much more complicated and significantly higher-capacity wetware.

Clearly if you were to (hypothetically of course) dissect the brain of someone who speaks Chinese, then no individual synapse or neuron is going to understand Chinese. Similarly, inside a C++ compiler's source code, no individual line of code understands C++. Complex capabilities are normally built out of a great many simple parts, none of which individually have that capability. Being (apparently) able to understand the meaning of Chinese is a complex collective emergent phenomenon, one that seems to require a system of a pretty significant level of complexity to be able to be trained do even a passable job: roughly 30–100 parrots deep, with  parameters, depending on just how good a job you want done. Things like dictionaries, grammars and encyclopedias that try to encapsulate the meaning of a human language (in a verbal way that a human could learn from) tend to be quite large and densely packed, and yet are clearly still not fully comprehensive — so the fact that this is quite a large system seems unsurprising. To anyone familiar with non-linear systems, the fact that one that complex is capable of behaviors that its individual parts don't have is entirely unsurprising. The only remarkable-feeling bit here is that that includes phenomena like "meaning" and "understanding" that were, until fairly recently, generally assumed to be exclusively the province of philosophers of semantics. But then, we're constructing and training minds, or at least passable facsimiles of them, so invading some territory that was previously the domain only of philosophers is rather to be expected.

  1. ^

    Technically, each individual neural net layer's behavior is not stochastic, apart from the last softmax layer where the residual activations are converted to logits and then a token (or tokens) is selected randomly using some, typically greedy (or beam search), schema based on those logits. That is the only stochastic part in the neural network. One could of course argue that, as soon as the stochastic output of that final softmax layer picking an output token gets fed back into the context for the next forward pass, it infects all the other layers' behaviors with stochasticity. But mostly I'm just going to plead that the title "A Chinese Room Containing a Stack of Non-Stochastic Parrots Feeding a Final Stochastic Parrot" was distinctly less catchy, and even "A Chinese Room Containing a Stochastic Stack of Initially-Mostly Non-Stochastic Parrots" was still a bit much, and instead just gloss over this detail as tangential to my main argument.

  2. ^

    A game which, coincidentally, in the UK is called "Chinese Whispers".

  3. ^

    Obviously not with literally that machine learning technique, since the human mind does not, sadly, emit logits for alternative tokens while typing text, but in the analogy sense of a student model learning a capability from a teacher model via a great many examples.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 3:48 PM

My favorite response to the Chinese Room argument:

An exactly parallel argument to [Searle's] would be that the behavior of H2O molecules can never explain the liquidity of water, because if we entered into the system of molecules … we should only find on visiting it pieces which push one against another, but never anything by which to explain liquidity.  But in both cases we would be looking at the system at the wrong level.  The liquidity of water is not to be found at the level of the individual molecule, nor is the [understanding of Chinese] to be found at the level of the individual neuron or synapse.

Yes, this rebuttal is simple, straightforward, and I daresay obvious. The reason it's remarkable is who made it: a certain John Searle (about Leibniz's thought experiment involving a mill, essentially identical to the much later Chinese room).

I'm actually not sure I agree with the specifics of that response: water molecules have attractive forces towards each other (such as hydrogen bonding) as well as repulsive ones, and that does in fact explain both the hardness of ice and the surface tension of water that holds it together as a liquid rather than a vapor at room temperature. But I am amused by the irony you point out.