The Tails Come Apart As Metaphor For Life, but with an extra pun.
Suppose you task your friends with designing the Optimal Meal. The meal that maximizes utility, in virtue of its performance at the usual roles food fills for us. We leave aside considerations such as sourcing the ingredients ethically, or writing the code for an FAI on the appetizer in tiny ketchup print, or injecting the lettuce with nanobots that will grant the eater eternal youth, and solely concern ourselves with arranging atoms to get a good meal qua meal.
So you tell your friends to plan the best meal possible, and they go off and think about it. One comes back and tells you that their optimal meal is like one of those modernist 30-course productions, where each dish is a new and exciting adventure. The next comes back and says that their optimal meal is mostly just a big bowl of their favorite beef stew, with some fresh bread and vegetables.
To you, both of these meals seem good - certainly better than what you've eaten recently. But then you start worrying that if this meal is important, then the difference in utility between the two proposed meals might be large, even though they're both better than the status quo (say, cold pizza). In a phrase, gastronomical waste. But then how do you deal with the fact that different people have chosen different meals? Do you just have to choose one yourself?
Now your focus turns inward, and you discover a horrifying fact. You're not sure which meal you think is better. You, as a human, don't have a utility function written down anywhere, you just make decisions and have emotions. And as you turn these meals over in your mind, you realize that different contexts, different fleeting thoughts or feelings, different ways of phrasing the question, or even just what side of the bed you got up on that morning, might influence you to choose a different meal at a point of decision, or rate a meal differently during or after the fact.
You contain within yourself the ability to justify either choice, which is remarkably like being unable justify either choice. This "Optimal Meal" was a boondoggle all along. Although you can tell that either would be better than going home and eating cold pizza, there was never any guarantee that your "better" was a total ordering of meals, not merely a partial ordering.
Then, disaster truly strikes. Your best friend asks you "So, what do you want to eat?"
You feel trapped. You can't decide. So you call your mom. You describe to her these possible meals, and she listens to you and makes sympathetic noises and asks you about the rest of your day. And you tell her that you're having trouble choosing and would like her help, and so she thinks for a bit, and then she tells you that maybe you should try the modernist 30-course meal.
Then you and your friends go off to the Modernism Bistro, and you have a wonderful time.
This is a parable about how choosing the Optimal Arrangement Of All Atoms In The Universe is an impossible moral problem. Accepting this as a given, what kind of thing is happening when we accept the decision of some authority (superhuman AI or otherwise) as to what should be done with those atoms?
When you were trying to choose what to eat, there was no uniquely right choice, but you still had to make a choice anyhow. If some moral authority (e.g. your mom) makes a sincere effort to deliberate on a difficult problem, this gives you an option that you can accept as "good enough," rather than "a waste of unknowable proportions."
How would an AI acquire this moral authority stuff? In the case of humans, we can get moral authority by:
- Taking on the social role of the leader and organizer
- Getting an endorsement or title from a trusted authority
- Being the most knowledgeable or skilled at evaluating a certain problem
- Establishing personal relationships with those asked to trust us
- Having a track record of decisions that look good in hindsight
- Being charismatic and persuasive
You might think "Of course we shouldn't trust an AI just because it's persuasive." But in an important sense, none of these reasons is good enough. We're talking about trusting something as an authority on an impossible problem, here.
A good track record on easier problems is a necessary condition to even be thinking about the right question, true. I'm not advocating that we fatalistically accept some random nonsense as the meaning of life. The point is that even after we try our hardest, we (or an AI making the choice for us) will be left in the situation of trying to decide between Optimal Meals, and narrowing this choice down to one option shouldn't be thought of as a continuation of the process that generated those options.
If after dinner, you called your mom back and said "That meal was amazing - but how did you figure out that was what I really wanted?", you would be misunderstanding what happened. Your mom didn't solve the problem of underdetermination of human values, she just took what she knew of you and made a choice - an ordinary, contingent choice. Her role was never to figure out what you "really wanted," it was to be an authority whose choice you and your friends could accept.
So there are two acts of trust that I'm thinking about this week. The first is how to frame friendly AI as a trusted authority rather than an oracle telling us the one best way to arrange all the atoms. And the second is how a friendly AI should trust its own decision-making process when it does meta-ethical reasoning, without assuming that it's doing what humans uniquely want.