I am republishing this essay because of recent discussions about erratic, ‘emotional’ and aggressive behavior by Bing’s AI, Sydney. There has been some discussion about whether it’s ethical to run Sydney given that behavior. People are responding to such claims with “don’t be ridiculous, of course Sydney doesn’t feel anything, Sydney is a machine for predicting the next token of text”. While I am inclined to agree with that conclusion on the whole, I think the issue is a bit more complex. For ongoing discussion, see:



Blake Lemoine is an engineer who worked for Google. He is claiming LaMDA, a language model he worked with, was sentient. Google put him on unpaid leave. Most people think his claim is absurd because language models are models of what word is most likely to follow a prior sequence of words (see, for example, GPT-3). How could such a thing be sentient? Moreover, there are unmistakable oddities and logical gaps in the behavior of LaMDA in the very transcripts that Lemoine is relying on- some proof of personhood then!

Just spitballing here, putting a hypothesis forward in a spirit of play and humility, but I wonder if Lemoine’s claim is not as absurd as many think. The concept of sentience is quite elusive, so let’s leave it behind for something slightly better understood- personhood. I think that it is conceivable that LaMDA contains persons. However my reasons, unlike Blake Lemonie’s, have little to do with a given conversation in which the model claimed to be sentient or a person.

When a language model is guessing the next token given that transformers are black boxes, we can’t rule out the possibility it is simulating interacting beliefs, desires, and emotions of the hypothetical author it is “roleplaying”. Simulation in this sense is quite a minimal concept, all that is needed are structures that interact and influence each other in a way isomorphic, at a very high level of abstraction to the interactions of desires and emotions in a real person. It is conceivable that it has built such a model of interacting mental states as the most accurate way to predict the next word of text. After all, language models seem to have built an implicit model of how things are related in the world (a world model) through very high-level models of how words co-occur with each other. Simulation of a person might be the best way to guess what a person would say next.

This might have precedent in human psychology. Perhaps the most popular account of human theory of mind capabilities is the simulation theory of folk psychology- c.f. Alvin Goldman. According to this theory, we predict what people will do in a given situation by simulating them. This makes intuitive sense. The human mind contains many working parts, for a process so complex, running a model of it seems like the best way to make a prediction as to what it will do.

But if you accept that a working person simulation is a person, which many do, it follows that LaMDA contains a person or many people, or perhaps one should say it creates a person every time it has to predict the next token. Note, however, that in whatever way you phrase it, it is not that LaMDA itself is a person on this model. Rather a good emulation of a person (and thus a person) might be part of it.

Now let me double back to scale down a previous claim. It’s not quite that a working person emulation is a person, it’s that a working person emulation over a certain degree of complexity is a person.

We need to add this stipulation because if every emulation were a person, it would be likely that you and I also contain multiple people. Perhaps personhood is a matter of degree, with no sharp boundaries, like the term “heap”. The more complex the simulated mass of beliefs, desires and other mental states is, the more like a real person it is. If LaMDA is simulating people, whether or not those simulations are themselves people will depend on whether they cross the complexity threshold. To some degree, this may be a purely verbal question.

This brings us back to the objection that LaMDA’s behavior in the transcripts involves jumps a real person wouldn’t make. This probably represents, at least in part, failures of its model of persons, either through insufficient detail or through the inclusion of inaccurate detail. Do these breakdowns in the model mean that no personhood is present? That’s a matter of degree, it’s a bit like asking whether something is enough of a heap to count- very hard to answer.

To summarise:

1. I don't see how we can rule out the possibility Lambada runs something like a person model to predict what a writer would write next, with interacting virtual components isomorphic to beliefs, desires, and other mental states. I believe that the transformer architecture is flexible enough to run such a simulation, as shown by the fact that it can clearly achieve a kind of world model through modeling the associations of words.

2. I don’t think we can rule out the possibility that the model of a person invoked could be quite a sophisticated one.

3. I also don’t think we can rule out the view that a model or simulation of a person, above a certain threshold of sophistication, is itself a person.

On the basis of these considerations, I don’t think the claim LaMDA is a person, or rather ‘contains’ in some sense persons, is as absurd as it may appear at first blush. This has little to do with Lemoine's route to the claim, but it is not counterposed to it. There’s nothing particularly special about LaMDA claiming to be a person, but the conversations that led Lemoine to agree with it involve a degree of “psychological” “depth”, which might illustrate the complexity of the required simulation.


I should be clearer about what I mean by saying a model only has to be abstract and high level to count as a model of a person. I don’t mean sensible models of persons can be simple or, lacking in detail. Rather, I mean that the relationship of isomorphism that is required is an abstract one. For example, if the machine is modeling an interacting set of beliefs, desires, habits, etc. to guess what an author would say next, the components of the model do not have to be explicitly labeled as “belief” “desire” etc. Instead, they just have to interact with each other in corresponding patterns to those that beliefs, desires, and habits really do, or rather an approximation of such. In other words, they have to function like beliefs, desires, habits, etc.

Edit x2:

On another thread, @TheAncientGreek wrote: “We already disbelieve in momentary persons. In the original imitation game, the one that the Turing test is based on, people answer questions as if they are historical figures, and the other players have to guess who they are pretending to be. But no one thinks a player briefly becomes Napoleon.”

I responded: “I believe that in the process of simulating another person you effectively create a quasi-person who is separated from true personhood only by a matter of degree. Humans seem to guess what other people would do by simulating them, according to our best current models of how folk psychology works. These emulations of other don't count as persons, but not for any qualitative reason, only due to a matter of degree.

If we were much more intelligent and better at simulating others than we are, then we really would temporarily create a "Napoleon" when we pretended to be him. A caveat here is important, it's not Napoleon, it's a being psychologically similar to Napoleon (if we are good imitators).”

I’ve included my response here because I think it’s probably the most important objection to my argument here.

Edit x3:

I say it in the body of the essay, but let me spell it out again. My claim is not:

>LaMDA is a person

My claim is more like:

>LaMDA creates simulations of persons to answer questions that differ from real people primarily on a quantitative rather than a qualitative dimension. Whether you want to say it crosses the line is a matter of degree.

It very probably doesn’t, on a fair drawing of the line, reach personhood. But it’s much more interesting to me that it’s only a matter of degree between it and personhood than that it doesn’t happen to reach that degree, if that makes sense


New Comment

New to LessWrong?