Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Normalising utility as willingness to pay

9Gurkenglas

9Alexei

4shminux

2Gurkenglas

2Pattern

1Gurkenglas

1Pattern

6romeostevensit

5Dagon

New Comment

It seems to me that how to combine utility functions follows from how you then choose an action.

Let's say we have 10 hypotheses and we maximize utility. We can afford to let each hypothesis rule out up to a tenth of the action space as extremely negative in utility, but we can't let a hypothesis assign extremely positive utility to any action. Therefore we sample about 9 random actions (which partition action space into 10 pieces) and translate the worst of them to 0, then scale the maximum over all actions to 1. (Or perhaps, we set the 10th percentile to 0 and the hundredth to 1.)

Let's say we have 10 hypotheses and we sample a random action from the top half. Then, by analogous reasoning, we sample 19 actions, normalize the worst to 0 and the best to 1. (Or perhaps set the 5th to 0 and the 95th to 1. Though then it might devolve into a fight over who can think of the largest/smallest number on the fringes...)

The general principle is giving the daemon as much slack/power as possible while bounding our proxy of its power.

On a purely fun note, sometimes I imagine our universe running on such "willingness to pay" for each quantum event. At each point in time various entities observing this universe bid on each quantum event, and the next point in time is computed from the bid winners.

Hah, the auction interpretation of Quantum Mechanics! Wonder what restrictions would need to be imposed on the bidders in order to preserve both the entanglement and relativity.

They could be merely aliens with their supertelescopes trained on us, with their planet rigged to explode if the observation doesn't match the winning bid, abusing quantum immorality.

"Paying utility" in this kind of analysis means to undertake negative-utility behaviors outside the game we're analyzing, in order to achieve better (higher-utility) outcomes in the area we're discussing. The valuation / bargaining question is about how to identify how important the game is relative to other things.

For simple games, it's often framed in dollars: "how much would you pay to play a game where you can win X or lose Y with this distribution", where the amount you'd pay is the value of the game (and it's assumed, but not stated nearly often enough that the range of outcomes is such that it's roughly linear to utility for you).

I think this writeup gets a little confusing in not being very explicit about when it's talking about an agent's overall utility function, and when it's talking about a subset of a utility function for a given game. There is never a "willingness to pay" anything that reduces overall utility. The question is willingness to pay in one domain to influence another. This willingness is obviously based entirely on maximizing overall utility.

I've thought of a framework that puts most of the methods of interteoretic utility normalisation and bargaining on the same footing. See this first post for a reminder of the different types of utility function normalisation.

Most of the normalisation techniques can be conceived of as a game with two outcomes, and each player can pay a certain amount of their utility to flip from one one outcome to another. Then we can use the maximal amount of utility they are willing to pay, as the common measuring stick for normalisation.

Consider for example the min-max normalisation: this assigns utility 0 to the expected utility if the agent makes the worst possible decisions, and 1 if they make the best possible ones.

So, if your utility function is u, the question is: how much utility would you be willing to pay to prevent your nemesis (a −u maximiser) from controlling the decision process, and let you take it over instead? Dividing u by that amount

^{[1]}will give you the min-max normalisation (up to the addition of a constant).Now consider the mean-max normalisation. For this, the game is as follows: how much would you be willing to pay to prevent a policy from choosing randomly amongst the outcomes ("mean"), and let you take over the decision process instead?

Conversely, the mean min-mean normalisation asks how much you would be willing to pay to prevent your nemesis from controlling the decision process, and shifting to a random process instead.

The mean difference method is a bit different: here, two outcomes are chosen at random, and you are asked now much you are willing to pay to shift from the worst outcome to the best. The expectation of that amount is used for normalisation.

The mutual Worth bargaining solution has a similar interpretation: how much would you be willing to pay to move from the default option, to one where you controlled all decisions?

A few normalisations don't seem to fit into the this framework, most especially those that depend on the

squareof the utility, such as variance normalisation or the Nash Bargaining solution. The Kalai–Smorodinsky bargaining solution uses a similar normalisation as the mutual worth bargaining solution, but chooses the outcome differently: if the default point is at the origin, it will pick the point (x,x) with largest x.This, of course, would incentivise you to lie - but that problem is unavoidable in bargaining anyway. ↩︎