Intertheoretic utility comparison

by Stuart_Armstrong 5 min read3rd Jul 201811 comments

23

Ω 1


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is presenting some old work on combining different possible utility functions, that is worth revealing to the world.

I've written before about the problem of reaching an agreement between agents with different utility functions. The problem re-appears if you yourself are uncertain between two different moral theories.

For example, suppose you gave credence to average utilitarianism and credence to total utilitarianism. In an otherwise empty universe, you can create one person with utility, or a thousand with utility.

If we naively computed the expected utility of both actions, we would get for the first choice, and for the second. It therefore seems that total utilitarianism wins by default, even though it is very unlikely (for you).

But the situation can be worse. Suppose that there is a third option, which created ten thousand people with each utility. And you have credence on average utilitarianism, credence on total utilitarianism, and credence on exponential utilitarianism, where the average utility is multiplied by two to the power of the population. In this case the third option - and the incredibly unlikely exponential utilitarianism - win out massively.

Normalising utilities

To prevent the large-population-loving utilities from winning out by default, it's clear we need to normalise the utilities in some way before adding them together, similarly to how you normalise the utilities of opposing agents.

I'll distinguish two methods here: individual normalisations, and collective normalisations. For individual normalisations, if you have credences of for utilities , then is normalised into using some procedure that is independent of , , and for . Then the normalised utilities are added to give your total utility function of:

In collective normalisations, the normalisation of into is allowed to depend upon the other utilities and the credences. All Pareto outcomes for the utilities are equivalent (modulo resolving ties) with maximising such a .

The Nash Bargaining Equilibrium and the Kalai-Smorodinsky Bargaining Solution are both collective normalisations; the Mutual Worth Bargaining Solution is an individual normalisation iff the choice of the default point is individual (but doing that violates the spirit of what that method is supposed to achieve).

Note that there are no non-dictatorial Pareto normalisations, whether individual or collective, that are independent of irrelevant alternatives, or that are immune to lying.

Individual normalisations

Here I'll present the work that I did with Owen Cotton-Barratt, Toby Ord, and Will MacAskill, in order to try and come up with a principled way of doing individual normalisations. In a certain sense, this work failed: we didn't find any normalisations that were clearly superior in every way to others. But we did find a lot about the properties of the different normalisations; one interesting thing is that the dumbest normalsation - the zero-one, or min-max - has surprisingly good properties.

Let be the option set for the agent: the choices that it can make (in our full treatment, we considered a larger set , the normalisation set, but this won't be needed here).

For the purpose of this post, will be equal to , the set of deterministic policies the agent can follow; this feels like a natural choice, as it's what the agent really has control over.

For any and , there is the expected utility of conditional on the agent following policy ; this will be designated by .

We may have a probability distribution over (maybe defined by the complexity of the policy?). If we don't have such a normalisation, and the set of deterministic policies is finite, then we can set to be the uniform distribution.

Then, given , each becomes a real-valued random variable, taking value with probability . We'll normalise these by normalising the properties of this random variable.

First of all, let's exclude any that are constant on all of ; these utilities cannot be changed, in expectation, by the agent's policies, so should make no difference. Then each , seen as a random variable, has the following properties:

  • Maximum: .
  • Minimum: .
  • Mean: .
  • Variance: .
  • Mean difference: .

There are five natural normalisation methods that emerge from these properties. The first and most trivial is the min-max or zero-one normalisation: scale and translate so that takes the value and takes the value (note that the translation doesn't change the desired policy when summing utilities, so what is actually required is to scale so that ).

The second nomalisation, the mean-max, involves setting ; by symmetry, the min-mean normalisation involves setting

Finally, the last two normalisations involve setting either the variance, or the mean difference, to .

Meaning of the normalisations

What do these normalisations mean? Well, min-max is a normalisation that cares about the difference between perfect utopia and perfect dystopia: between the best possible and the worst possible expected outcome. Conceptually, this seems problematic - it's not clear why the dystopia matters, with seems like something that opens the utility up to extortion - but, as we'll see, the min-max normalisation has the best formal properties.

The mean-max is the normalisation that most appeals to me; the mean is the expected value of random policy, while the max is the expected outcome of the best policy. In a sense, that's the job of an agent with a single utility function: to move the outcome from random to best. Thus the max has a meaning that the min, for instance, lacks.

For this reason, I don't see the min-mean normalisation as being anything meaningful; it's the difference between complete disaster and a random policy.

I don't fully grasp the meaning of the variance normalisation; Owen Cotton-Barratt did the most work on it, and showed that, in a certain sense, it was resistant to lying/strategic distortion in certain circumstances, if a given utility didn't 'know' what the other utilities would be. But I didn't fully understand this point. So bear in mind that this normalisation has positive properties that aren't made clear in this post.

Finally, the mean difference normalisation controls the spread between the utilities of the different policies, in a linear way that may seem to be more natural than the variance.

Properties of the normalisation

So, which normalisation is best? Here's were we look at the properties of the normalisations (they will be summarised in a table at the end). As we've seen, independence of irrelevant alternatives always fails, and there can always be an incentive for a utility to "lie" (as in, there are , , , , and , such that would have a higher expected utility under the final if it was replaced with ).

What other properties do all the normalisations share? Well, since they normalise independently, is continuous in . And because the minimum, maximum, variance, etc... are continuous in and in , then is also continuous in that information.

In contrast, the best policy of is not typically continuous in the data. Imagine that there are two utilities and two policies: and . Then for , is the optimal policy (for all the above normalisations for uniform ), while for , is optimal.

Ok, that's enough of properties that all methods share; what about ones they don't?

First of all, we can look at the negation symmetry between and . Min-max, variance, and mean difference all have the same normalisation for and ; mean-max and min-mean do not, since the mean can be closer to the min than that max (or vice versa).

Then we can consider what happens when some policies and are clones of each other: imagine that for all , . Then what happens if we remove the redundant and normalise on ? Well, it's clear that the maximum or minimum value of cannot change (since if was a maximum/minimum, then so is , which remains), so the min-max normalisation is unaffected.

All the other normalisations change, though. This can be seen in the example , , with , , , , and ; in terms of sets of expected utilities in terms of policies, has while has . Then for uniform , all other normalisation methods change if we remove which is identical to for both utilities.

Thus all the other normalisation change when we add (or remove) clones of existing policies.

Finally, we can consider what happens if we are in one of several worlds, and the policies/utilities are the identical in some of these worlds. This should be treated the same as if those identical worlds were all one.

So, imagine that we are in one of three worlds: , , and , with probabilities , , and , respectively. Before taking any actions, the agent will discover which world it is in. Thus, if is the set of policies in , the complete set of policies is .

The worlds and are, however, indistinguishable for all utilities in . Thus we can identify , with for all . Then a normalisation method combines indistinguishable choices property if the normalisation is the same in world and . Then:

  • Min-max, mean-max, and min-mean combine indistinguishable choices. Variance and mean difference normalisations do not.

Proof (sketch): Let be the random variable that is on under the assumption that is the true underlying world. Then on , behaves like the random variable . (this means that has probability of being , not that it adds random variables together). Mean, max, and min all have the property that ; variance and mean difference, on the other hand, do not.

Summary of properties

In my view, it is a big worry that the variance and mean difference normalisations fail to combine indistinguishable choices. World and could be strictly identical, except for some irrelevant information that all utility functions agree is irrelevant. We have to worry about whether the light from a distant star is slightly redder or slightly bluer than expected; what colour ink was used in a proposal; the height of the next animal we see, and so on.

This means that we cannot divide the universe into relevant and irrelevant variables, and focus solely on the first.

In table form, the various properties are:


As can be seen, the min-max method, simplistic though it is, has all the possible nice properties.

23

Ω 1