Agents that act for reasons: a thought experiment

Michele Campolo

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Posted also on the EA Forum.

In Free agents I’ve given various ideas about how to design an AI that reasons like an independent thinker and reaches moral conclusions by doing so. Here I’d like to add another related idea, in the form of a short story / thought experiment.

Cursed

Somehow, you have been cursed. As a result of this unknown curse that is on you now, you are unable to have any positive or negative feeling. For example, you don’t feel pain from injuries, nothing makes you anxious or excited or sad, you can’t have fun anymore. If it helps you, imagine your visual field without colours, only with dull shades of black and white that never feel disgusting or beautiful.

Before we get too depressed, let’s add another detail: this curse also makes you immune to death (and other states similar to permanent sleep or unconsciousness). If you get stabbed, your body magically recovers as if nothing happened. Although this element might add a bit of fun to the story from our external perspective, keep in mind that the cursed version of you in the story doesn’t feel curious about anything, nor has fun when thinking about the various things you could do as an immortal being.

No one else is subject to the same curse. If you see someone having fun and laughing, the sentence “This person is feeling good right now” makes sense to you: although you can’t imagine nor recall what feeling good feels like, your understanding of the world around you remained intact somehow. (Note: I am not saying that this is what would actually happen in a human being who actually lost the capacity for perceiving valence. It’s a thought experiment!)

Finally, let’s also say that going back to your previous life is not an option. In this story, you can’t learn anything about the cause of the curse or how to reverse it.

To recap:

You can’t feel anything
You can’t die
You can’t go back to your previous state
The curse only affects you. Others’ experiences are normal.

In this situation, what do you do?

In philosophy, there is some discourse around reasons for actions, normative reasons, motivating reasons, blah blah blah. Every philosopher has their own theory and uses words differently, so instead of citing centuries of philosophical debates, I’ll be maximally biased and use one framework that seems sensible to me. In Ethical Intuitionism, contemporary philosopher Michael Huemer distinguishes “four kinds of motivations we are subject to”:

Appetites: examples are hunger, thirst, lust (simple, instinctive desires)
Emotions: anger, fear, love (emotional desires, they seem to involve a more sophisticated kind of cognition than appetites)
Prudence: motivation to pursue or avoid something because it furthers or sets back one’s own interests, like maintaining good health
Impartial reasons: motivation to act due to what one recognises as good, fair, honest, et cetera.

You can find more details in section 7.3 of the book.

We can interpret the above thought experiment as asking: in the absence of appetites and emotions — call these two “desires”, if you wish — what would you do? Without desires and without any kind of worry about your own death, does it still make sense to talk about self-interest? What would you do without desires and without self-interest?

My guess is that, in the situation described in Cursed, at least some, if not many, would decide to do things for others. The underlying intuition seems to be that, without one’s own emotional states and interests, one would prioritise others’ emotional states and interests, simply due to the fact that nothing else seems worth doing in that situation.

In other words, although one might need emotional states to first develop an accurate understanding of the world, feeling positive emotions when acting morally is not the main reason why one keeps acting morally. Ask yourself: if, from now on, you noticed that you derive some pleasure from causing harm to others, would you completely change your behaviour and start acting immorally? You might be tempted or have some motivation to cause harm, but that motivation would conflict with other reasons for action, including moral reasons.

Anyway, you might not buy into everything I’ve just said. The important point is that at least some human beings, in Cursed, would act morally. So, there is a class of agents that, in the Cursed situation, would act morally — according to their own understanding of what “morally” means.

What’s the point of all this for AI? I claim that the class of agents just described includes some artificial agents. In other words, I claim that it’s possible to build an AI whose cognitive state is roughly similar to the cognitive state of a human in the Cursed situation, and that this AI acts morally — for the reason that, in such a cognitive state, nothing else seems worth doing.

LESSWRONG
LW

Agents that act for reasons: a thought experiment

3

Ω 2

New to LessWrong?

3

Ω 2