Verbal parity: What is it and how to measure it? + an edited version of "Against John Searle, Gary Marcus, the Chinese Room thought experiment and its world"

philosophybear

I've written previously about an idea I call verbal parity, viz:

A machine has verbal parity when, if given input in written form, it can give an appropriate output in written form as a response, as well as a human, for any input.

Now this definition has many problems. For example, there is a huge variation among humans in how well they can respond to inputs. Is this as well as any human? For any prompt? That would be infeasible, even for a vastly superhuman AI ("question, what's in Tom Cruise's pocket right now?" A question only Tom Cruise can answer, presumably). So it's not really a precise concept.

It is, however, somewhat more precise than "General intelligence". It pretty clearly is a kind of generalization of the intuition that infused Turing when he described the imitation game.

I have been interested in this idea for a long time. You can see relevant work here:

Here: https://www.lesswrong.com/posts/GHeEdubBHjcqeoxjP/recent-advances-in-natural-language-processing-some-woolly

And here: https://www.lesswrong.com/posts/mypCA3AzopBhnYB6P/language-models-are-nearly-agis-but-we-don-t-notice-it

And in the appended essay, at the end of this document which I recently made major edits too after some useful feedback [thanks to TAG's suggestions- they were invaluable]

I initially introduced the idea of verbal parity here on my Substack, but I won't repost it in this forum unlike my other essays because I don't think it contains enough useful content. beyond the basic idea I've already introduced.

I'm looking for help

Having written about the topic of verbal parity before, I have a sense that we should try to estimate when it will happen for a number of reasons:

It matters a great deal for AI timelines
When it happens, it will change people's perceptions of AI. The ability to talk to something really shapes your idea of it.
It will be enormously economically, socially and politically disruptive in good and bad ways.
It's more concrete, and thus easier to predict than "AGI".

Now, as I noted above, there are problems with the definition I gave- it's far from precise. Let me try and state it more precisely.

Verbal parity: A machine has verbal parity if and only if, for any task presented in English, it can do better at that task than 95% of English speakers.

This rules out hyper-specific tasks, and also unreasonable expectations like being able to write Nobel prize-winning poetry. It's still far too vague a thing to measure though.

I would like to find very specific tasks that are "in the vicinity" of this and get prediction markets set up on when they will be achieved. A lot of standard tasks fail, by reason of not requiring a horizon greater than, at most, a few thousand tokens- although keeping an eye on future expected performance in these tasks will certainly give us valuable information in predicting when verbal parity will happen. Any ideas for tasks? Does anyone have the capacity to get them up on a forecasting site for me?

What follows is an edited version of an essay I posted the other day, rewritten because, as TAG pointed out, I made a bizarre mistake regarding how the original Chinese Room thought worked. This did not materially invalidate my conclusion, but I thought it best to rewrite and repost it.

Against John Searle, Gary Marcus, the Chinese Room thought experiment and its world

The title is a play on “Against the Airport and its World” and is in no way intended as a slight against any named author, both of whom I respect intellectually, and do not know enough about to evaluate as people.

The other day I gave an argument that it may be that the differences between whatever LaMDA is and true personhood may be more quantitative than qualitative. But there’s an old argument that no model which is based purely on processing text and outputting text can understand anything. If such models can’t understand the text they work with, then any claim they may have to personhood is at least tenuous, indeed let us grant, at least provisionally, scrapped

That argument is the Chinese Room Argument. Gary Marcus, for example, invokes it in his 2022 article “Google’s AI is not sentient. Not even slightly”- [Edit: or I should say, at least on my reading of Marcus’s article he alludes to the Chinese Room argument although some of my readers disagree].

To be clear, Marcus, unlike Searle does not think that no AI could be sentient, but he does think, as far as I can tell, that a pure text-in, text-out model could not be sentient for Chinese Room-related reasons. Such models merely associate text with text- they are a “giant spreadsheet” in his memorable phrase. One might say they have a purely syntactic not semantic character.

I will try to explain why I find the Chinese Room argument unconvincing, not just as proof that AI couldn’t be intelligent, but even as proof that a language model alone can’t be intelligent. Even though the arguments I go through here have already been hashed out by other, better philosophers, I want to revisit this issue and say something on it- even if it’s only a rehash of what other people have said- because the issue of what a model that works on a text-in-text-out basis can or cannot understand is very dear to my heart.

The Chinese Room argument, summarised by Searle goes:

“Imagine a native English speaker who knows no Chinese locked in a room full of boxes of Chinese symbols (a database) together with a book of instructions for manipulating the symbols (the program). Imagine that people outside the room send in other Chinese symbols which, unknown to the person in the room, are questions in Chinese (the input). And imagine that by following the instructions in the program the man in the room is able to pass out Chinese symbols which are correct answers to the questions (the output). The program enables the person in the room to pass the Turing Test for understanding Chinese but he does not understand a word of Chinese.”

The mechanism by which the argument works is somewhat mysterious. Here’s Searle on that:

Now suppose further that after this first batch of Chinese writing I am given a second batch of Chinese script together with a set of rules for correlating the second batch with the first batch. The rules are in English, and I understand these rules as well as any other native speaker of English. They enable me to correlate one set of formal symbols with another set of formal symbols, and all that 'formal' means here is that I can identify the symbols entirely by their shapes. Now suppose also that I am given a third batch of Chinese symbols together with some instructions, again in English, that enable me to correlate elements of this third batch with the first two batches, and these rules instruct me how to give back certain Chinese symbols with certain sorts of shapes in response to certain sorts of shapes given me in the third batch. Unknown to me, the people who are giving me all of these symbols call the first batch "a script," they call the second batch a "story. ' and they call the third batch "questions." Furthermore, they call the symbols I give them back in response to the third batch "answers to the questions." and the set of rules in English that they gave me, they call "the program."

I’ve always thought that two replies- taken jointly - capture the essence of what is wrong with the argument.

The whole room reply: It is not the individual in the room who understands Chinese, but the room itself. This reply owes to many people, too numerous to list here.

The cognitive structure reply: The problem with the Chinese room thought experiment is that it doesn’t have the right sort of cognitive structure. If the Chinese room used instead of some kind of internal model of how things relate to each other in the world in order to give its replies, it would understand Chinese- and, moreover, large swathes of the world. This reply, I believe, owes to David Braddon-Mitchell and to Frank Jackson. It’s hard to tell from Searle’s description of how the machine works, but I think perhaps part of the intuition is that there is no modelling of the world going on- or the modelling is at least hidden behind a pretty abstract description of what is happening (manipulating symbols according to rules with no obvious correlation with anything external to themselves).

The summary of the two replies I’ve endorsed, taken together, is:

“The Chinese Room Operator does not understand Chinese. However, if a system with a model of interrelations of things in the world were used instead, the room as a whole, but not the operator, could be said to understand Chinese.”

There need be nothing mysterious about this modeling relationship I mention here. It’s just the same kind of modeling a computer does when it predicts the weather. Roughly speaking I think X models Y if X contains parts that are isomorphic to the parts of Y, and these stand in isomorphic relationships with each other (especially the same or analogous causal relationships) that the parts of Y do. Also, the inputs and outputs of the system causally relate to the thing modeled in the appropriate way. Maybe Searle’s conception of the Chinese Room models the world, but it seems a bit ambiguous in his presentation.

It is certainly possible in principle for a language model to contain such world models. It also seems to me likely that actually existing language models can be said to contain these kinds of models implicitly, though very likely not at a sufficient level of sophistication to count as people. Think about how even a simple feed-forward, fully connected neural network could model many things through its weights and biases, and through the relationships between its inputs, outputs and the world.

Indeed, we know that these language models contain such world models at least to a degree. We have found nodes that correspond to variables like “positive sentiment” and “negative sentiment'“. The modeling relationship doesn’t have to be so crude as “one node, one concept” to count, but in some cases, it is.

The memorisation response

Let me briefly deal with one reply to the whole room argument that Searle makes- what if the operator of the Chinese room memorized the books and applied them? She could now function outside the room as if she were in it, but surely she wouldn’t understand Chinese.

But didn’t we stipulate that there has to be some modelling of the world going on? Sure, but maybe there is. Maybe she calculates the answer with a model representing parts of the world, but she (or at least her English-speaking half) does not understand these calculations. Imagine that instead of memorizing very abstract rules, she had memorized a vast sequence of abstract relationships- perhaps represented by complex geometric shapes, which she moves around in her mind according to rules in an abstract environment to decide what she will say next in Chinese. Let’s say that the shapes in this model implicitly represent things in the real world, with relationships between each other that are isomorphic to relationships between real things, and appropriate relationships to inputs and outputs. Now Searle says “look, this operator still doesn’t understand Chinese, but she has the right cognitive processes according to you.”

But I have a reply- In this case I’d say that she’s effectively been bifurcated into two people, one of which doesn’t have semantic access to the meanings of what the other says. When she runs the program of interacting abstract shapes that tell her what to say in Chinese, she is bringing another person into being. This other person is separated from her, because it can’t interface with her mental processes in the right way [This “the operator is bifurcated” response is not new- c.f. many such as Haugeland who gives a more elegant and general version of it].

Making the conclusion intuitive

Let me try to make this conclusion more effective through a digression.

It is not by the redness of red that you understand the apple, it is by the relationships between different aspects of your sensory experience. The best analogy here, perhaps, is music. Unless you have perfect pitch, you wouldn’t be able to distinguish between c4 and f4 if I played them on a piano for you (seperated by a sufficient period of time). You might not even be able to distinguish between c4 and c5. What you can distinguish are the relationships between notes. You will most likely be able to instantly hear the difference between me playing C4 then C#4 and me playing C4 then D4 (the interval C4-C#4 will sound sinister because it is a minor interval. The interval between C4 and D4 will sound harmonious because it is a major interval. You will know that both are rising in pitch. Your understanding comes from the relationships between bits of your experience and other bits of your experience.

I think much of the prejudice against the Chinese room comes from the fact that it receives its input in text:

Consider this judgment by Gary Marcus on claims that LaMDA possesses a kind of sentience:

“Nonsense. Neither LaMDA nor any of its cousins (GPT-3) are remotely intelligent. All they do is match patterns, drawn from massive statistical databases of human language. The patterns might be cool, but language these systems utter doesn’t actually mean anything at all. And it sure as hell doesn’t mean that these systems are sentient. Which doesn’t mean that human beings can’t be taken in. In our book Rebooting AI, Ernie Davis and I called this human tendency to be suckered by The Gullibility Gap — a pernicious, modern version of pareidolia, the anthromorphic bias that allows humans to see Mother Theresa in an image of a cinnamon bun. Indeed, someone well-known at Google, Blake LeMoine, originally charged with studying how “safe” the system is, appears to have fallen in love with LaMDA, as if it were a family member or a colleague. (Newsflash: it’s not; it’s a spreadsheet for words.)”

But all we humans do is match patterns in sensory experiences. True, we do so with inductive biases that help us to understand the world by predisposing us to see it in such ways, but LaMDA also contains inductive biases. The prejudice comes, in part, I think, from the fact that it’s patterns in texts, and not, say, pictures or sounds.

Now it’s important to remember that there really is nothing qualitatively different between a passage containing text, and an image because both can easily include each other. Consider this sentence. “The image is six hundred pixels by six hundred pixels. At point 1,1 there is red 116. At point 1,2 there is red 103”…” and so on. Such a sentence conveys all the information in the image. Of course, there are quantitative reasons this won’t be feasible in many cases, but they are only quantitative.

I don’t see any reason in principle that you can’t build an excellent model of the world through relationships between text alone. As I wrote a long time ago [ed: in a previous essay in this anthology.]:

“In hindsight, it makes a certain sense that reams and reams of text alone can be used to build the capabilities needed to answer questions like these. A lot of people remind us that these programs are really just statistical analyses of the co-occurrence of words, however complex and glorified. However, we should not forget that the statistical relationships between words in a language are isomorphic to the relations between things in the world—that isomorphism is why language works. This is to say the patterns in language use mirror the patterns of how things are. Models are transitive—if x models y, and y models z, then x models z. The upshot of these facts are that if you have a really good statistical model of how words relate to each other, that model is also implicitly a model of the world, and so we shouldn't surprised that such a model grants a kind of "understanding" about how the world works.”

Now that’s an oversimplification in some ways (what about false statements, deliberate or otherwise), but in the main the point holds. Even in false narratives, things normally relate to each other in the same way they relate in the real world, generally you’ll only start walking on the ceiling if that’s key to the story, for example. The relationships between things in the world are implicit in the relationships between words in text, especially over large corpora. Not only is it possible in principle for a language model to use these, I think it’s very possible that, in practice, backpropagation could arrive at them. In fact, I find it hard to imagine the alternative, especially if you’re going to produce language to answer complex questions with answers that are more than superficially plausible.

Note: In this section, I have glossed over the theory-ladeness of perception in this section and treated perception as if it were a series of discrete “sense data” that we relate statistically, but I don’t think it would create any problems for my argument to expand it to include a more realistic view of perception. This approach just makes exposition easier.

What about qualia?

I think another part of the force of the Chinese room thought experiment comes from qualia. In this world of text associated with text in which the Chinese room lives where is the redness of red? I have two responses here.

The first is that I’m not convinced that being a person requires qualia, I think that if philosophical zombies are possible, they still count as persons, and have at least some claim to ethical consideration.

The second is that qualia are poorly understood. They essentially amount to the non-functional part of experience, the redness of red that would remain even if you swapped red and green in a way that made no difference to behavior, in the famous inverted spectrum argument. Currently, we have no real leads in solving the hard problem. Thus who can say that there couldn’t be hypothetical language models that feel the wordiness of certain kinds of words? Maybe verbs are sharp and adjectives are so. We haven’t got a theory of qualia that would rule this out. I’d urge interested readers to read more about functionalism, probably our best current theory in the philosophy of mind. I think it puts many of these problems in perspective.

Edit: An excellent study recently came to my attention showing that when GPT-2 is taught to play chess by receiving the moves of games (in text form) as input, it knows where the pieces are, that is to say, it contains a model of the board state at any given time. “Chess as a Testbed for Language Model State Tracking” (2021) As the authors of that paper suggest, this is a toy case that gives us evidence these word machines work by world modeling