Respecting your Local Preferences

Scott Garrabrant

In this post, I give a application of geometric rationality to a toy version of a real problem.

A Conflicted Agent

Let's say you are an agent with two partially conflicting goals. Part of you wants to play a video game, and part of you wants to save the world and tile the multiverse with computronium shaped exactly the way you like it (not paperclips). How should these conflicting interests figure out what to do? (Assume you have not yet had the idea of starting a video game company to save the world.)

We will assume that of you wants to save the world, and $1 / 3$ of you wants to play video games.

At first your world-saving-self has the bright idea that 2/3 is bigger that 1/3, and you should therefore devote all your time to saving the world. However, this proposal doesn't stick. Eventually, you end up Nash bargaining with your time, and devoting 2/3 of your time to saving the world and 1/3 of your time to playing video games. This works well for a while, but then your world-saving-self has a new bright idea:

"Let's look around at the world, and see how much ability we expect to have to save it. If it feels like we are in the top 60 percentile of worlds ordered by how much control we have, then we will try to save the world. If we are in the bottom 40 percentile, we will play video games!"

(The 60 percentile is arbitrarily rounding down from 2/3, so that you can both play more video games and save more worlds.)

A Nash Bargaining Model

Let's model this more carefully. Let's say there are five different types of worlds: 1, 2, 3, 4, and 5. In each world, you have two buttons in front of you. The video game button, and the save the world button. Each time step, pressing the video game button lets you play some video games, and pressing the save the world button saves the world with probability $ε \cdot i$ in the world of type $i$ .

We have five degrees of freedom. For each $i \in {1, 2, 3, 4, 5}$ , we have $p_{i}$ , which represents the proportion of our time in world i that we spend pressing the save the world button. The rest of the time is spent playing video games.

The part of you that wants to save the world has power $\frac{2}{3}$ , and utility equal to $10^{10^{100}} \cdot ε \cdot (p_{1} + 2 p_{2} + 3 p_{3} + 4 p_{4} + 5 p_{5})$ . However, since we are Nash bargaining, the coefficient out in front does not matter.

The part of you that wants to play video games has power $\frac{1}{3}$ and utility equal to $5 - p_{1} - p_{2} - p_{3} - p_{4} - p_{5}$ .

We are trying to maximize the weighted geometric mean, $\sqrt[3]{(p_{1} + 2 p_{2} + 3 p_{3} + 4 p_{4} + 5 p_{5})^{2} (5 - p_{1} - p_{2} - p_{3} - p_{4} - p_{5})}$ , on the cube $[0, 1]^{5}$ .

This achieves a maximum when $p_{1} = p_{2} = 0$ , and $p_{3} = p_{4} = p_{5} = 1$ . Simple enough.

(Linear is not the best model for the distribution of how much control you have. I am just trying to keep things simple.)

Local Preferences

The main problem with the above analysis, according to me, is that it is not respecting the locality of some of your preferences.

Maybe your desire to save the world is nonlocal. Maybe you care equally about whether this world is saved and whether some hypothetical other world in which you have more or less control over saving the world is saved. Why should you care about this Everett branch more than the other ones? I will grant that your altruistic self thinks that way, but I am guessing your desire to play video games probably doesn't.

The part of me that wants to play video games wants to play video games in this world. If I ask it what it wants in other Everett branches, it doesn't really care much. This does not mean it is making a mistake. Indeed, if I were to go back in time before I observed what world I am in and ask it whether it wanted to play more video games, concentrated in a small number of futures, or fewer total video games equally distributed, it chooses the equal distribution. The (video game part of a) version of me that does not yet know what world it is in is a coalition of many different versions of me that all want to play video games, and are unwilling to trade arbitrary amounts of one of them playing for another one of them playing.

When we combined the 5 different hypothetical parts of you that want to play video games in each world into one big part with a single utility function, we made a mistake. Let us try again.

A Better Nash Bargaining Model

We will keep the same model as before, but we will split up the video game preference.

Instead of one component with power $\frac{1}{3}$ , we will have five different components, each with power $\frac{1}{15}$ , whose utility functions are $1 - p_{i}$ , for $i \in {1, 2, 3, 4, 5}$ .

We are trying to maximize $\sqrt[15]{(p_{1} + 2 p_{2} + 3 p_{3} + 4 p_{4} + 5 p_{5})^{10} (1 - p_{1}) (1 - p_{2}) (1 - p_{3}) (1 - p_{4}) (1 - p_{5})}$ on the cube $[0, 1]^{5}$ .

This turns out to be maximized when $p_{i} = \frac{i - 1}{i}$ .

As a quick sketch of a proof, observe we can multiply by a constant, and equivalently maximize $\sqrt[15]{(p_{1} + 2 p_{2} + 3 p_{3} + 4 p_{4} + 5 p_{5})^{10} (10 - 10 p_{1}) (20 - 20 p_{2}) (30 - 30 p_{3}) (40 - 40 p_{4}) (50 - 50 p_{5})}$ , which can be viewed as the geometric mean of 15 numbers, where $(p_{1} + 2 p_{2} + 3 p_{3} + 4 p_{4} + 5 p_{5})$ is repeated 10 times and the other factors are repeated once. However, observe that regardless of the $p_{i}$ values, the arithmetic mean of these 15 numbers is 10, since they sum to 150, which means the geometric mean cannot be larger than 10, by the AM-GM inequality. We can achieve a geometric mean of 10 by setting all 15 values equal to 10, by setting $p_{i} = \frac{i - 1}{i}$ .

Note that $p_{1} = 0$ , and in world 1, you don't save the world at all. However, none of the preferences are being exploited here. The part of you in world 1 that wants to save the world is happy you are prioritizing other worlds. If we had $p_{i} = 1$ for some i, on the other hand, this would be a sign that some of your local preferences to play video games were being exploited, since those preferences do not care about other worlds.

The closest we get to that is in where in world 5, $p_{5} = .8$ , and you are spending 80 percent of your time saving the world, and 20 percent of you time playing video games. This is more that 2/3 of your time, which is actually caused by the parts of you in the other worlds having preferences over your actions in world 5, which decrease your video game time, but not all the way to 0.

Note that the above analysis is sort of the most updateless version, where you allow for trade across the different worlds as much as possible. There could be an argument that you should be playing video games even more, because the parts of you in other worlds that care about this world are not actually here to enforce their preferences, but it is hard for me to imagine a good argument that you should be playing less given these preferences.

Note that this argument is about self cooperation. Maybe you want to cooperate with a much larger collection of potential agents behind a much larger veil of ignorance. I am not arguing against that here. I am only trying to take seriously the argument that you should cooperate with your self more by putting more effort into saving the world because you have surprisingly high leverage.

[-]SMK4y70

Related: A bargaining-theoretic approach to moral uncertainty by Greaves and Cotton-Barratt. Section 6 is especially interesting where they highlight a problem with the Nash approach; namely that the NBS is variant to whether (sub-)agents are bargaining over all decision problems (which they are currently facing and think they will face with nonzero probability) simultaneously, or whether all bargaining problems are treated separately and you find the solution for each individual problem—one at a time.

In the 'grand-world' model, (sub-)agents can bargain across situations with differing stakes and prima facie reach mutually beneficial compromises, but it's not very practical (as the authors note) and would perhaps depend too much on the priors in question (just as with updatelessness). In the 'small-world' model, on the other hand, you don't have problems of impracticality and so on, but you will miss out on a lot of compromises.

84

Respecting your Local Preferences

84

A Conflicted Agent

A Nash Bargaining Model

Local Preferences

A Better Nash Bargaining Model

84

84