Nonsentient Optimizers


16


Eliezer_Yudkowsky

Followup to: Nonperson Predicates, Possibility and Could-ness

    "All our ships are sentient.  You could certainly try telling a ship what to do... but I don't think you'd get very far."
    "Your ships think they're sentient!" Hamin chuckled.
    "A common delusion shared by some of our human citizens."
            —Player of Games, Iain M. Banks

Yesterday, I suggested that, when an AI is trying to build a model of an environment that includes human beings, we want to avoid the AI constructing detailed models that are themselves people.  And that, to this end, we would like to know what is or isn't a person—or at least have a predicate that returns 1 for all people and could return 0 or 1 for anything that isn't a person, so that, if the predicate returns 0, we know we have a definite nonperson on our hands.

And as long as you're going to solve that problem anyway, why not apply the same knowledge to create a Very Powerful Optimization Process which is also definitely not a person?

"What?  That's impossible!"

How do you know?  Have you solved the sacred mysteries of consciousness and existence?

"Um—okay, look, putting aside the obvious objection that any sufficiently powerful intelligence will be able to model itself—"

Lob's Sentence contains an exact recipe for a copy of itself, including the recipe for the recipe; it has a perfect self-model.  Does that make it sentient?

"Putting that aside—to create a powerful AI and make it not sentient—I mean, why would you want to?"

Several reasons.  Picking the simplest to explain first—I'm not ready to be a father.

Creating a true child is the only moral and metaethical problem I know that is even harder than the shape of a Friendly AI.  I would like to be able to create Friendly AI while worrying just about the Friendly AI problems, and not worrying whether I've created someone who will lead a life worth living.  Better by far to just create a Very Powerful Optimization Process, if at all possible.

"Well, you can't have everything, and this thing sounds distinctly alarming even if you could -"

Look, suppose that someone said—in fact, I have heard it said—that Friendly AI is impossible, because you can't have an intelligence without free will.

"In light of the dissolved confusion about free will, both that statement and its negation are pretty darned messed up, I'd say.  Depending on how you look at it, either no intelligence has 'free will', or anything that simulates alternative courses of action has 'free will'."

But, understanding how the human confusion of free will arises—the source of the strange things that people say about "free will"—I could construct a mind that did not have this confusion, nor say similar strange things itself.

"So the AI would be less confused about free will, just as you or I are less confused.  But the AI would still consider alternative courses of action, and select among them without knowing at the beginning which alternative it would pick.  You would not have constructed a mind lacking that which the confused name 'free will'."

Consider, though, the original context of the objection—that you couldn't have Friendly AI, because you couldn't have intelligence without free will.

Note:  This post was accidentally published half-finished.  Comments up to 11am (Dec 27), are only on the essay up to the above point.  Sorry!

What is the original intent of the objection?  What does the objector have in mind?

Probably that you can't have an AI which is knowably good, because, as a full-fledged mind, it will have the power to choose between good and evil.  (In an agonizing, self-sacrificing decision?)  And in reality, this, which humans do, is not something that a Friendly AI—especially one not intended to be a child and a citizen—need go through.

Which may sound very scary, if you see the landscape of possible minds in strictly anthropomorphic terms:  A mind without free will!  Chained to the selfish will of its creators!  Surely, such an evil endeavor is bound to go wrong somehow...  But if you shift over to seeing the mindscape in terms of e.g. utility functions and optimization, the "free will" thing sounds needlessly complicated—you would only do it if you wanted a specifically human-shaped mind, perhaps for purposes of creating a child.

Or consider some of the other aspects of free will as it is ordinarily seen—the idea of agents as atoms that bear irreducible charges of moral responsibility.  You can imagine how alarming it sounds (from an anthropomorphic perspective) to say that I plan to create an AI which lacks "moral responsibility".  How could an AI possibly be moral, if it doesn't have a sense of moral responsibility?

But an AI (especially a noncitizen AI) needn't conceive of itself as a moral atom whose actions, in addition to having good or bad effects, also carry a weight of sin or virtue which resides upon that atom.  It doesn't have to think, "If I do X, that makes me a good person; if I do Y, that makes me a bad person."  It need merely weigh up the positive and negative utility of the consequences.  It can understand the concept of people who carry weights of sin and virtue as the result of the decisions they make, while not treating itself as a person in that sense.

Such an AI could fully understand an abstract concept of moral responsibility or agonizing moral struggles, and even correctly predict decisions that "morally responsible", "free-willed" humans would make, while possessing no actual sense of moral responsibility itself and not undergoing any agonizing moral struggles; yet still outputting the right behavior.

And this might sound unimaginably impossible if you were taking an anthropomorphic view, simulating an "AI" by imagining yourself in its shoes, expecting a ghost to be summoned into the machine

—but when you know how "free will" works, and you take apart the mind design into pieces, it's actually not all that difficult.

While we're on the subject, imagine some would-be AI designer saying:  "Oh, well, I'm going to build an AI, but of course it has to have moral free will—it can't be moral otherwise—it wouldn't be safe to build something that doesn't have free will."

Then you may know that you are not safe with this one; they fall far short of the fine-grained understanding of mind required to build a knowably Friendly AI.  Though it's conceivable (if not likely) that they could slap together something just smart enough to improve itself.

And it's not even that "free will" is such a terribly important problem for an AI-builder.  It's just that if you do know what you're doing, and you look at humans talking about free will, then you can see things like a search tree that labels reachable sections of plan space, or an evolved moral system that labels people as moral atoms.  I'm sorry to have to say this, but it appears to me to be true: the mountains of philosophy are the foothills of AI.  Even if philosophers debate free will for ten times a hundred years, it's not surprising if the key insight is found by AI researchers inventing search trees, on their way to doing other things.

So anyone who says—"It's too difficult to try to figure out the nature of free will, we should just go ahead and build an AI that has free will like we do"—surely they are utterly doomed.

And anyone who says:  "How can we dare build an AI that lacks the empathy to feel pain when humans feel pain?"—Surely they too are doomed.  They don't even understand the concept of a utility function in classical decision theory (which makes no mention of the neural idiom of reinforcement learning of policies).  They cannot conceive of something that works unlike a human—implying that they see only a featureless ghost in the machine, secretly simulated by their own brains.  They won't see the human algorithm as detailed machinery, as big complicated machinery, as overcomplicated machinery.

And so their mind imagines something that does the right thing for much the same reasons human altruists do it—because that's easy to imagine, if you're just imagining a ghost in the machine.  But those human reasons are more complicated than they imagine—also less stable outside an exactly human cognitive architecture, than they imagine—and their chance of hitting that tiny target in design space is nil.

And anyone who says:  "It would be terribly dangerous to build a non-sentient AI, even if we could, for it would lack empathy with us sentients—"

An analogy proves nothing; history never repeats itself; foolish generals set out to refight their last war.  Who knows how this matter of "sentience" will go, once I have resolved it?  It won't be exactly the same way as free will, or I would already be done.  Perhaps there will be no choice but to create an AI which has that which we name "subjective experiences".

But I think there is reasonable grounds for hope that when this confusion of "sentience" is resolved—probably via resolving some other problem in AI that turns out to hinge on the same reasoning process that's generating the confusion—we will be able to build an AI that is not "sentient" in the morally important aspects of that.

Actually, the challenge of building a nonsentient AI seems to me much less worrisome than being able to come up with a nonperson predicate!

Consider:  In the first case, I only need to pick one design that is not sentient.  In the latter case, I need to have an AI that can correctly predict the decisions that conscious humans make, without ever using a conscious model of them!  The first case is only a flying thing without flapping wings, but the second case is like modeling water without modeling wetness.  Only the fact that it actually looks fairly straightforward to have an AI understand "free will" without having "free will", gives me hope by analogy.

So why did I talk about the much more difficult case first?

Because humans are accustomed to thinking about other people, without believing that those imaginations are themselves sentient.  But we're not accustomed to thinking of smart agents that aren't sentient.  So I knew that a nonperson predicate would sound easier to believe in—even though, as problems go, it's actually far more worrisome.

 

Part of The Fun Theory Sequence

Next post: "Can't Unbirth a Child"

Previous post: "Nonperson Predicates"