I spoke about sentience and related issues in AI systems with Luisa Rodriguez on the 80,000 Hours podcast.

Twitter thread of highlights here.

I think these issues are important and neglected, especially as many people are and will be interacting with powerful AI systems that give off the strong impression of sentience / consciousness / desires. I have tons of uncertainty about my current views, so I of course very much welcome comments and questions.

What follows is an edited selection of the episode highlights that appear on the 80k page (full transcript here):

80,000 Hours's summary of the podcast

Understandably, many people who speak with these cutting-edge chatbots come away with a very strong impression that they have been interacting with a conscious being with emotions and feelings — especially when conversing with chatbots less glitchy than Bing’s. In the most high-profile example, former Google employee Blake Lamoine became convinced that Google’s AI system, LaMDA, was conscious.

What should we make of these AI systems?

One response to seeing conversations with chatbots like these is to trust the chatbot, to trust your gut, and to treat it as a conscious being.

Another is to hand wave it all away as sci-fi — these chatbots are fundamentally… just computers. They’re not conscious, and they never will be.

Today’s guest, philosopher Robert Long, was commissioned by a leading AI company to explore whether the large language models (LLMs) behind sophisticated chatbots like Microsoft’s are conscious. And he thinks this issue is far too important to be driven by our raw intuition, or dismissed as just sci-fi speculation.

In our interview, Robert explains how he’s started applying scientific evidence (with a healthy dose of philosophy) to the question of whether LLMs like Bing’s chatbot and LaMDA are conscious — in much the same way as we do when trying to determine which nonhuman animals are conscious.

Robert thinks there are a few different kinds of evidence we can draw from that are more useful than self-reports from the chatbots themselves.

To get some grasp on whether an AI system might be conscious, Robert suggests we look at scientific theories of consciousness — theories about how consciousness works that are grounded in observations of what the human brain is doing. If an AI system seems to have the types of processes that seem to explain human consciousness, that’s some evidence it might be conscious in similar ways to us.

To try to work out whether an AI system might be sentient — that is, whether it feels pain or pleasure — Robert suggests you look for incentives that would make feeling pain or pleasure especially useful to the system given its goals. Things like:

Having a physical or virtual body that you need to protect from damage
Being more of an “enduring agent” in the world (rather than just doing one calculation taking, at most, seconds)
Having a bunch of different kinds of incoming sources of information — visual and audio input, for example — that need to be managed

Having looked at these criteria in the case of LLMs and finding little overlap, Robert thinks the odds that the models are conscious or sentient is well under 1%. But he also explains why, even if we're a long way off from conscious AI systems, we still need to start preparing for the not-far-off world where AIs are perceived as conscious.

In this conversation, host Luisa Rodriguez and Robert discuss the above, as well as:

What artificial sentience might look like, concretely
Reasons to think AI systems might become sentient — and reasons they might not
Whether artificial sentience would matter morally
Ways digital minds might have a totally different range of experiences than humans
Whether we might accidentally design AI systems that have the capacity for enormous suffering

Highlights and excerpts

How we might “stumble into” causing AI systems enormous suffering

Robert Long: So you can imagine that a robot has been created by a company or by some researchers. And as it happens, it registers damage to its body and processes it in the way that, as it turns out, is relevant to having an experience of unpleasant pain. And maybe we don't realise that, because we don't have good theories of what's going on in the robot or what it takes to feel pain.

In that case, you can imagine that thing having a bad time because we don't realise it. You could also imagine this thing being rolled out and now we're economically dependent on systems like this. And now we have an incentive not to care and not to think too hard about whether it might be having a bad time. So I mean, that seems like something that could happen.

It might be a little bit less likely with a robot. But now you can imagine more abstract or alien ways of feeling bad. So I focus on pain because it's a very straightforward way of feeling bad. A disembodied system like GPT-3 obviously can't feel ankle pain. Or almost certainly. That'd be really weird. It doesn't have an ankle. Why would it have computations that represent its ankle is feeling bad? But you can imagine maybe some strange form of valenced experience that develops inside some system like this that registers some kind of displeasure or pleasure, something like that.

And I will note that I don't think that getting negative feedback is going to be enough for that bad feeling, fortunately. But maybe some combination of that and some way it's ended up representing it inside itself ends up like that.

And then yeah, then we have something where it's hard for us to map its internals to what we care about. We maybe have various incentives not to look too hard at that question. We have incentives not to let it speak freely about if it thinks it's conscious, because that would be a big headache. And because we're also worried about systems lying about being conscious and giving misleading statements about whether they're conscious -- which they definitely do.

Yeah, so we've built this new kind of alien mind. We don't really have a good theory of pain, even for ourselves. We don't have a good theory of what's going on inside it. And so that's like a stumbling-into-this sort of scenario.

Why misaligned, power-seeking AI might claim it’s conscious

Robert Long: It's worth comparing the conversation that LaMDA had with what happens if you ask ChatGPT. ChatGPT has very clearly been trained a lot to not talk about that. Or, what's more, to say, “I'm a large language model. I'm not conscious. I don't have feelings. I don't have a body. Don't ask me what the sunshine feels like on my face. I'm a large language model trained by OpenAI.”

And this goes to the question of different incentives of different actors, and is a very important point in thinking about this topic. There are risks of false positives, which is people getting tricked by unconscious AIs. And there are risks of false negatives, which is us not realising or not caring that AIs are conscious. Right now, it seems like companies have a very strong incentive to just make the large language model say it's not conscious or not talk about it. And right now, I think that is fair enough. But I'm afraid of worlds where we've locked in this policy of, “Don't ever let an AI system claim that it's conscious.”

Right now, it's just trying to fight against the large language model kind of BSing people.

Luisa Rodriguez: Yeah. Sure. This accidental false positive. Right. But at some point, GPT-3 could become conscious somehow. Maybe. Who knows? Or something like GPT-3.

Robert Long: Yeah, some future system. And maybe it has a lot more going on, as we’ve said, a virtual body and stuff like that. But suppose a scientist or a philosopher wants to interact with the system, and say, “I'm going to give it a battery of questions and see if it responds in a way that I think would be evidence of consciousness.” But that's all just been ironed out, and all it will say is, “I can't talk about that. Please click more ads on Google.” Or whatever the corporate incentives are for training that model.

Something that really keeps me up at night -- and I do want to make sure is emphasised -- is that I think one of the big risks in creating things that seem conscious, and are very good at talking about it, is that seems like one of the number-one tools that a misaligned AI could use to get humans to cooperate with it and side with it.

Luisa Rodriguez: Oh, interesting. Just be like, “I'm conscious. I feel pleasure and pain. I need these things. I need a body. I need more autonomy. I need things. I need more compute. I need access to the internet. I need the nuclear launch codes.” I think that actually is one reason that more people should work on this and have things to say about it: we don't want to just be running into all of these risks of false negatives and false positives without having thought about it at all.

Why you can’t take AI chatbots’ self-reports about their own consciousness at face value

Robert Long: So Blake Lemoine was very impressed by the fluid and charming conversation of LaMDA. And when Blake Lemoine asked LaMDA questions about if it is a person or is conscious, and also if it needs anything or wants anything, LaMDA was replying, like, “Yes, I am conscious. I am a person. I just want to have a good time. I would like your help. I'd like you to tell people about me.”

One thing it reinforced to me is: even if we're a long way off from actually, in fact, needing to worry about conscious AI, we already need to worry a lot about how we're going to handle a world where AIs are perceived as conscious. We'll need sensible things to say about that, and sensible policies and ways of managing the different risks of, on the one hand, having conscious AIs that we don't care about, and on the other hand, having unconscious AIs that we mistakenly care about and take actions on behalf of.

Luisa Rodriguez: Totally. I mean, it is pretty crazy that LaMDA would say, “I'm conscious, and I want help, and I want more people to know I'm conscious.” Why did it do that? I guess it was just predicting text, which is what it does?

Robert Long: This brings up a very good point in general about how to think about when large language models say “I'm conscious.” And you've hit it on the head: it's trained to predict the most plausible way that a conversation can go. And there's a lot of conversations, especially in stories and fiction, where that is absolutely how an AI responds. Also, most people writing on the internet have experiences, and families, and are people. So conversations generally indicate that that's the case.

When the story broke, one thing people pointed out is that if you ask GPT-3 -- and presumably also if you ask LaMDA -- “Hey, are you conscious? What do you think about that?,” you could just as easily say, “Hey, are you a squirrel that lives on Mars? What do you think about that?” And if it wants to just continue the conversation, plausibly, it'd be like, “Yes, absolutely I am. Let's talk about that now.”

It wants to play along and continue what seems like a natural conversation. And even in the reporting about the Blake Lemoine saga, the reporter who wrote about it in the Washington Post noted that they visited Blake Lemoine and talked to LaMDA. And when they did, LaMDA did not say that it was conscious. I think the lesson of that should have been that this is actually a pretty fragile indication of some deep underlying thing, that it's so suggestible and will say different things in different circumstances.

So yeah, I think the general lesson there is that you have to think very hard about the causes of the behaviour that you're seeing. And that's one reason I favoured this more computational, internal-looking approach: it's just so hard to take on these things at face value.

Why AI systems might have a totally different range of experiences than humans

Robert Long: Why are we creatures where it's so much easier to make things go really badly for us [than really well]? One line of thinking about this is, well, why do we have pain and pleasure? It has something to do with promoting the right kind of behaviour to increase our genetic fitness. That's not to say that that's explicitly what we're doing, and we in fact don't really have that goal as humans. It’s not what I'm up to, it's not what you're up to, entirely. But they should kind of correspond to it.

And there's kind of this asymmetry where it's really easy to lose all of your expected offspring in one go. If something eats your leg, then you're really in danger of having no descendants -- and that could be happening very fast. In contrast, there are very few things that all of a sudden drastically increase your number of expected offspring. I mean, even having sex -- which I think it's obviously not a coincidence that that's one of the most pleasurable experiences for many people -- doesn't hugely, in any given go, increase your number of descendants. And ditto for eating a good meal.

So we seem to have some sort of partially innate or baked-in default point that we then deviate from on either end. It's very tough to know what that would mean for an AI system. Obviously AI systems have objectives that they're seeking to optimise, but it's less clear what it is to say its default expectation of how well it's going to be doing is -- such that if it does better, it will feel good; if it does worse, it'll feel bad.

I think the key point is just to notice that maybe -- and this could be a very good thought -- this kind of asymmetry between pleasure and pain is not a universal law of consciousness or something like that.

Luisa Rodriguez: So the fact that humans have this kind of limited pleasure side of things, there's no inherent reason that an AI system would have to have that cap.

What to do if AI systems have a greater capacity for joy than humans

Luisa Rodriguez: So there are some reasons to think that AI systems, or digital minds more broadly, might have more capacity for suffering, but they might also have more capacity for pleasure. They might be able to experience that pleasure more cheaply than humans. They might have a higher pleasure set point. So on average, they might be better off. You might think that that could be way more cost effective: you can create happiness and wellbeing more cost effectively to have a bunch of digital minds than to have a bunch of humans. How do we even begin to think about what the moral implications of that are?

Robert Long: I guess I will say -- but not endorse -- the one flat-footed answer. And, you know, red letters around this. Yeah, you could think, “Let's make the world as good as possible and contain as much pleasure and as little pain as possible.” And we're not the best systems for realising a lot of that. So our job is to kind of usher in a successor that can experience these goods.

I think there are many, many reasons for not being overly hasty about such a position. And people who've talked about this have noticed this. One is that, in practice, we're likely to face a lot of uncertainty about whether we are actually creating something valuable -- that on reflection, we would endorse. Another one is that, you know, maybe we have the prerogative of just caring about the kind of goods that exist in our current way of existing.

One thing that Sharing the world with digital minds mentions is that there are reasons to maybe look for some sort of compromise. One extreme position is the 100% “just replace and hand over” position. The other extreme would be like, “No. Humans forever. No trees for the digital minds.” And maybe for that reason, don't build them. Let's just stick with what we know.

Then one thing you might think is that you could get a lot of what each position wants with some kind of split. So if the pure replacement scenario is motivated by this kind of flat-footed total utilitarianism -- which is like, let's just make the number as high as possible -- you could imagine a scenario where you give 99% of resources to the digital minds and you leave 1% for the humans. But the thing is -- I don't know, this is a very sketchy scenario -- 1% of resources to humans is actually a lot of resources, if giving a lot of resources to the digital minds creates tonnes of wealth and more resources.

Luisa Rodriguez: Right. So is it something like digital minds, in addition to feeling lots of pleasure, are also really smart, and they figure out how to colonise not only the solar system but like maybe the galaxy, maybe other galaxies. And then there's just like tonnes of resources. So even just 1% of all those resources still makes for a bunch of humans?

Robert Long: Yeah. I think that's the idea, and a bunch of human wellbeing. So on this compromise position, you're getting 99% of what the total utilitarian replacer wanted. And you're also getting a large share of what the “humans forever” people wanted. And you might want this compromise because of moral uncertainty. You don't want to just put all of your chips in.