Utilitarianism Meets Egalitarianism

15Anthony DiGiovanni

22Caspar Oesterheld

6Scott Garrabrant

3Anthony DiGiovanni

13Charlie Steiner

7tailcalled

2ESRogs

7abramdemski

6Sylvester Kollin

5Caspar Oesterheld

3Scott Garrabrant

4Joe Carlsmith

2habryka

2martinkunev

2the gears to ascension

2jsd

New Comment

Let's pretend that you are a utilitarian. You want to satisfy everyone's goals

This isn't a criticism of the substance of your argument, but I've come across a view like this one frequently on LW so I want to address it: This seems like a pretty nonstandard definition of "utilitarian," or at least, it's only true of some kinds of preference utilitarianism.

I think utilitarianism usually refers to a view where what you ought to do is maximize a utility function that (somehow) aggregates a metric of **welfare** across individuals, not their goal-satisfaction. Kicking a puppy without me knowing about it thwarts my goals, but (at least on many reasonable conceptions of "welfare") doesn't decrease *my* welfare.

I'd be very surprised if most utilitarians thought they'd have a moral obligation to create paperclips if 99.99% of agents in the world were paperclippers (example stolen from Brian Tomasik), controlling for game-theoretic instrumental reasons.

I think in the social choice literature, people almost always mean preference utilitarianism when they say "utilitarianism", whereas in the philosophical/ethics literature people are more likely to mean hedonic utilitarianism. I think the reason for this is that in the social choice and somewhat adjacent game (and decision) theory literature, utility functions have a fairly solid foundation as a representation of preferences of rational agents. (For example, Harsanyi's "[preference] utilitarian theorem" paper and Nash's paper on the Nash bargaining solution make very explicit reference to this foundation.) Whereas there is no solid foundation for numeric hedonic welfare (at least not in this literature, but also not elsewhere as far as I know).

Agreed. I should have had disclaimer that I was talking about preference utilitarianism.

I am not sure what is true about what most people think.

My guess is that most philosophers who identify with utilitarianism mean welfare.

I would guess that most readers of LessWrong would not identify with utilitarianism, but would say they identify more with preference utilitarianism than welfare utilitarianism.

My guess is that a larger (relative to LW) proportion of EAs identify with utilitarianism, and also they identify with the welfare version (relative to preference version) more than LW, but I have a lot of uncertainty about how much. (There is probably some survey data that could answer this question. I haven't checked.)

Also, I am not sure that "controlling for game-theoretic instrumental reasons" is actually a move that is well defined/makes sense.

I agree with your guesses.

I am not sure that "controlling for game-theoretic instrumental reasons" is actually a move that is well defined/makes sense.

I don't have a crisp definition of this, but I just mean that, e.g., we compare the following two worlds: (1) 99.99% of agents are non-sentient paperclippers, and each agent has equal (bargaining) power. (2) 99.99% of agents are non-sentient paperclippers, and the paperclippers are all confined to some box. According to plenty of intuitive-to-me value systems, you only (maybe) have reason to increase paperclips in (1), not (2). But if the paperclippers felt really sad about the world not having more paperclips, I'd care—to an extent that depends on the details of the situation—about increasing paperclips even in (2).

The intro shows me how I disagree with the veil of ignorance. I think it's one of those thought experiments that only seems neutral because using your own perspective is so natural as to slide beneath notice.

Humans don't have utility functions. They're physical systems, and assigning utility functions to them is an act of modeling. What models are best depends (at least) on a choice of universal Turing machine, and different people can and do come to impasses about what humans "really want," or what physical systems are "really human."

"Behind the veil of ignorance" isn't really a place that exists, because that's like saying "find the best model *without *choosing a UTM." It sounds neutral but it's actually nonsense. In practice, of course, people will just imagine this happening by using the modeling assumptions that are natural to them.

Even more off-topic, this might lead real-world attempts to implement a veil of ignorance to run into steganography-like problems. An agent might be motivated to smuggle self-recognition behind the veil, and it can do so by making itself the only fixed point of its modeling assumptions - i.e. the agent that would endorse its own specific modeling assumptions (when interpreted with its own specific modeling assumptions).

On 3, I got nothing, which is unfortunate, because we need 3 to define either of the two proposals.

If people have a cost of action, then that might define a scale factor. This isn't necessarily a *good* solution, but it may be *a* solution.

Notice that Scott himself *proposes *a scaling solution shortly after saying "I got nothing". I guess "I got nothing" is rhetorical, expressing something about the non-obviousness of the answer rather than saying he literally has no answers.

Now, let's pretend you are an egalitarian. You still want to satisfy everyone's goals, and so you go behind the veil of ignorance, and forget who you are. The difference is that now you are not trying to maximize expected expected utility, and instead are trying to maximize worst-case expected utility.

Nitpick: I think this is a somewhat controversial and nonstandard definition of egalitarianism. Rather, this is the decision theory underlying Rawls' 'justice as fairness'; and, yes, Rawls claimed that his theory was egalitarian (if I remember correctly), but this has come under much scrutiny. See *Egalitarianism against the Veil of Ignorance* by Roemer, for example.

So the argument/characterization of the Nash bargaining solution is the following (correct?): The Nash bargaining solution is the (almost unique) outcome o for which there is a rescaling w of the utility functions such that both the utilitarian solution under rescaling w and the egalitarian solution under rescaling w is o. This seems interesting! (Currently this is a bit hidden in the proof.)

Do you show the (almost) uniqueness of o, though? You show that the Nash bargaining solution has the property, but you don't show that no other solution has this property, right?

Yeah, that is correct.

The thing I originally said about (almost) uniqueness was maybe wrong. Oops! I edited, and it it is correct now.

To see that there might be many solutions under the weakest notion of egalitarianism, consider the case where there are three people, , , and , with utility , , , and each with probability . The constraints on utility are that , and that . The thing is that if we give a small enough weight to , then almost everything we can do with and will be egalitarian, and anything on the Pareto frontier that gives both and positive utility will be able to be simultaneously egaltarian and utilitarian.

You can't run into this problem with two people, or with everyone ending up with the same utility.

Here is a proof that we get existence and uniqueness if we also have the constraint that everyone ends up with the same utility. The construction in the main post gives existence, because everyone has utility .

For uniqueness, we may take some point that satisfies utilitarianism, egalitarianism, and gives everyone the same utility. WLOG, it gives everyone utility , and is utilitarian and egalitarian with respect to the weight vector that gives everyone weight . This point maximizes the expected utility with respect to your your probability distribution. It also maximizes the expected logarithm of utility. This is because it achieves an expected logarithm of 0, and the concavity of the logarithm says that the expectation of the logarithm is at most the logarithm of the expectation, which is at most the logarithm of 1, which is 0. Thus, this point is a Nash bargaining solution (i.e. the point that maximizes expected log utility), and since the Nash bargaining solution is unique, it must be unique.

Note this is only saying the utility everyone gets is unique. There still might be multiple different strategies to achieve that utility.

Sorry for the (possible) error! It might be that the original thing turns out to be correct, but It depends on details of how we define the tiered egalitarian solution.

I summarized my thoughts on this sequence in my other review on the next post in this sequence. Most of my thoughts there also apply to this post.

"However, when everyone gets expected utility 1, the expected logarithm of expected utility will have the same derivative as expected expected utility"

Can you clarify this sentence? What functions are we differentiating?

if you maximize the product of utilities (nash phrased as product, or as area), you're invariant to scaling (because multiplying either by a constant just rescales all possibilities).

=>

if you maximize the log of utilities (nash phrased as log), you're invariant to scaling (because multiplying either by a constant just shifts all logs, because addition in log space is multiplication in linear space).

sure, makes sense.

I am bouncing off the proof of existence, but aaalmost see how this property implies it. I remember understanding it in CoCo.

I was a little bit confused about Egalitarianism not requiring (1). As an egalitarian, you may not need a full distribution over who you could be, but you do need the support of this distribution, to know what you are minimizing over?

This post is mostly propaganda for the Nash Bargaining solution, but also sets up some useful philosophical orientation. This post is also the first post in my geometric rationality sequence.

## Utilitarianism

Let's pretend that you are a utilitarian. You want to satisfy everyone's goals, and so you go behind the veil of ignorance. You forget who you are. Now, you could be anybody. You now want to maximize expected expected utility. The outer (first) expectation is over your uncertainty about who you are. The inner (second) expectation is over your uncertainty about the world, as well as any probabilities that comes from you choosing to include randomness in your action.

There is a problem. Actually, there are two problems, but they disguise themselves as one problem. The first problem is that it is not clear where you should get your distribution over your identity from. It does not make sense to just take the uniform distribution; there are many people you can be, and they exist to different extents, especially if you include potential future people whose existences are uncertain.

The second problem is that interpersonal utility comparisons don't make sense. Utility functions are not a real thing. Instead, there are preferences over uncertain worlds. If a person's preferences satisfy the VNM axioms, then we can treat that person as having a utility function, but the real thing is more like their preference ordering. When we get utility functions this way, they are only defined up to affine transformation. If you add a constant to a utility function, or multiply a utility function by a positive constant, you get the same preferences. Before you can talk about maximizing the expectation over your uncertainty about who you are, you need to put all the different possible utility functions into comparable units. This involves making a two dimensional choice. You have to choose a zero point for each person, together with a scaling factor for how much their utility goes up as their preferences are satisfied.

Luckily, to implement the procedure of maximizing expected expected utility, you don't actually need to know the zero points, since these only shift expected expected utility by a constant. You do, however need to know the scaling factors. This is not an easy task. You cannot just say something like "Make all the scaling factors 1." You don't actually start with utility functions, you start with equivalence classes of utility functions.

Thus, to implement utilitarianism, we need to know two things: What is the distribution on people, and how do you scale each person's utilities? This gets disguised as one problem, since the thing you do with these numbers is just multiply them together to get a single weight, but it is actually two things you need to decide. What can we do?

## Egalitarianism

Now, let's pretend you are an egalitarian. You still want to satisfy everyone's goals, and so you go behind the veil of ignorance, and forget who you are. The difference is that now you are not trying to maximize expected expected utility, and instead are trying to maximize worst-case expected utility. Again, the expectation contains uncertainty about the world as well as any randomness in your action. The "worst-case" part is about your uncertainty about who you are. You would like to have reasonably high expected utility, regardless of who you might be.

When I say maximize worst-case expected utility, I am sweeping some details under the rug about what to do if you manage to max out someone's utility. The actual proposal is to maximize the minimum utility over all people. Then if there are multiple ways to do this, consider the set of all people for which it is still possible to increase their utility without bringing anyone below this minimum. Repeat the proposal with only those people, subject to the constraint that you only consider actions that don't bring anyone below the current minimum. (Yeah, yeah, this isn't obviously well defined for infinitely many people. I am ignoring those details right now.)

This is called egalitarianism, because assuming you have the ability to randomize, and ignoring complications related to maxing out someone's utility, you will tend to give everyone the same expected utility. (For example, in the two person case, it will always be the case that either it is not possible to increase the expected utility of the person with lower expected utility, or the two people have the same expected utility.

Unfortunately, there are also two problems with defining egalitarianism. We no longer have to worry about a distribution on people. However, now we have to worry about what the zero point of each person's utility function is, and also what the scaling factor is for each person's utility function.

Unlike utilitarianism, egalitarianism will sometimes recommend randomizing between different outcomes for the sake of fairness.

## Utility Monsters

Utilitarianism and egalitarianism each have their own type of utility monster.

For utilitarianism, imagine Cookie Monster. Cookie Monster gets a bazillion utility for every cookie he gets. This dwarfs everyone's utility, and you should devote almost all your resources to giving cookies to Cookie Monster.

For egalitarianism, imagine Oscar the Grouch. Oscar hates everything. Worlds range from giving Oscar zero utility to giving Oscar one bazillionth of a utility. Assuming it is possible to give everyone else much more than a bazillionth of a utility simultaneously, you should devote almost all of your resources to maximizing Oscar's utility.

For both utilitarianism and egalitarianism, it is possible to translate and rescale utilities to create arbitrarily powerful utility monsters, which is to say that the choice of how to normalize utility really matters a lot.

## Filling in the Gaps

For defining either utilitarianism or egalitarianism, there are three hard to define parameters we need to consider:

1) The probability (from behind the veil of ignorance) that you expect to be each person,

2) The zero point of each person's utility function, and

3) The scaling factor of each person's utility function.

Utilitarianism requires both 1 and 3. Egalitarianism requires both 2 and 3. Unfortunately, I think that 1 and 2 are the two we have the most traction on.

1 feels more like an empirical question. It is mixed in with the question of where the priors come from. 1 is like asking "With what prior probability would you expect to have observed being any these people?"

2 feels like it is trying to define a default world. Something that is achievable, so it is possible to give everyone non-negative utility simultaneously. Maybe we can use something like understanding boundaries to figure out what 2 should be.

On 3, I got nothing, which is unfortunate, because we need 3 to define either of the two proposals. Is there anything reasonable we can do if we only have answers to 1 and 2?

Also, people have intuitions pointing towards both Utilitarianism and Egalitarianism. How are we supposed to decide between them?

## Why not Both?

Assume that we magically had an answer to both 1 and 2 above, so we both have a distribution over who we are behind the the veil of ignorance, and we also have a zero point for everyone's utility function. Assume further we are allowed to randomize in our action, and that it is possible to give everyone positive utility simultaneously. Then, there exists an answer to 3 such that utilitarianism and egalitarianism recommend the same action.

If we take the weakest notion of egalitarianism, which is just that the minimum utility is maximized, then there might be more than one such scaling. However, if we take the strongest notion of egalitarianism, that also everyone ends up with the same utility (arguably the true spirit of egalitarianism), then we will get existence and uniqueness of the scaling factors and the utilities. (I am not sure what the uniqueness situation is for the tiered egalitarianism proposal I gave above.)

Here is a proof sketch of the existence part:

Start with some arbitrary scaling factor on everyone's utility functions.

Consider the action which maximizes the expected logarithm of expected utility, where the outer expectation is over who you are, and the inner expectation is over randomness in the world or in your action. This point will be unique up to utility because of the convexity of the logarithm. Note that everyone will get positive utility.

For each person, rescaling their utility function will only add a constant to the logarithm of their expected utility, and will thus have no effect on maximizing the expected logarithm of expected utility.

Thus, we can rescale everyone's utilities so that everyone gets expected utility 1 when we maximize the expected logarithm of expected utility.

First, we need to see that given this rescaling, the utilitarian choice is to give everyone expected utility 1. Assume for the purpose of contradiction that there was some way to achieve expected expected utility greater than 1. Let A be the (randomized) action that gets everyone expected utility 1, and let B be a better action that gets expected expected utility 1+ε. If you consider the parameterized action (1−p)A+pB, and look at the derivative of expected expected utility respect to p at p=0, you get ε. However, when everyone gets expected utility 1, the expected logarithm of expected utility will have the same derivative as expected expected utility. Thus this derivative will also be ε, contradicting the fact that the policy maximizes the expected logarithm of expected utility at the action A that you get when p=0.

Next, let us see that given this rescaling, the egalitarian choice is to give everyone utility 1. If it were possible to give anyone expected utility greater than 1 without decreasing anyone's expected utility to less than 1, this would be a utilitarian improvement, which we already said was impossible. Thus, the only way to achieve a worst-case expected utility of 1 is to give everyone expected utility 1.

## Nash Bargaining

The above policy is an alternate characterization of the Nash bargaining solution, generalized to many players with different weights.

Given a zero point and a feasible set of options closed under random mixtures, The Nash bargaining solution gives a way of combining two utility functions into a single option.

The arguments in this post are not the most standard arguments for Nash bargaining. Nash bargaining can also be uniquely characterized with some simple axioms like Pareto optimality and independence of irrelevant alternatives.

There is a lot of reason to consider the Nash bargaining solution as the default way to combine utility functions when you don't have a principled way to do interpersonal utility comparisons. Even if you had a principled way of doing interpersonal utility comparisons, you might want to do Nash bargaining anyway for the sake of fairness.