This is a technical result that I wanted to check before writing up a major piece on value loading.

The purpose of a utility function is to give an agent criteria with which to make a decision. If two utility functions always give the same decisions, they're generally considered the same utility function. So, for instance, the utility function u always gives the same decisions as u+C for some constant C, or Du for some positive constant D. Thus we can say that utility functions are equivalent if they are related by a positive affine transformation.

For specific utility functions, and specific agents, the class of functions that give the same decisions is quite a bit larger. For instance, imagine that v is a utility function with the property v("any universe which contains humans")=constant. Then any human who attempts to follow u, could equivalently follow u+v (neglecting acausal trade) - it makes no difference. In general, if no action the agent could ever take would change the value of v, then u and u+v give the same decisions.

More subtly, if the agent can change v but cannot change the expectation of v, then u and u+v still give the same decisions. This is because for any actions a and b the agent could take:

E(u+v | a) = E(u | a) + E(v | a) = E(u | a) + E(v | b).

Hence E(u+v | a) > E(u+v | b) if and only if E(u | a) > E(u | b), and so the decision hasn't changed.

Note that E(v | a) need not be constant for all actions: simply that for every actions and b that an agent could take at a particular decision point, E(v | a) = E(v | b). It's perfectly possible for the expectation of v to be different at different moments, or conditional on different decisions made at different times.

Finally, as long as v obeys the above properties, there is no reason for it to be a utility function in the classical sense - it could be constructed any way we want.


An example: suffer not from probability, nor benefit from it

The preceding seems rather abstract, but here is the motivating example. It's a correction term T that adds or subtracts utility, as external evidence comes in (it's important that the evidence is external - the agent gets no correction from knowing what its own actions are/were). If the AI knows evidence e, and new (external) evidence f comes in, then its utility gets adjusted by T(e,f) which is defined as

T(e,f) = E(u | e) - E(u | e, f)

In other words, the agents utility gets adjusted by the difference between the new expected utility and the old - and hence the agent's expected utility is unchanged by new external evidence.

Consider for instance an agent with a utility u linear in money. It much choose between a bet that goes 50-50 on $0 (heads) or $100 (tails), versus a sure $49. It correctly choose the bet, having an expected utility of u=$50 - in other words, E(u, bet)=$50. But now imagine that the coin comes out heads. The utility u plunges to $0 (in other words E(u | bet, heads)=0). But the correction term cancels that out:

u(bet, heads) + T(bet, heads) = $0 + E(u | bet) - E(u |bet, heads) = $0 + $50 -$0 = $50.

A similar effect leaves utility unchanging if the coin is tails, cancelling the increase. In other words, adding the T correction term removes the impact of stochastic effects on utility.

But the agent will still make the same decisions. This is because before seeing evidence f, it cannot predict its impact on EU(u). In other words, summing over all possible evidences f:

E(u | e) = Σ p(f)E(u | e, f),

which is another way of phrasing "conservation of expected evidence". This implies that

E(T(e,-)) = Σ p(f)T(e,f)

Σ p(f)((E(u | e) - E(u | e, f))

= E(u | e) - Σ p(f)E(u | e, f)

= 0,

and hence that adding the T term does not change the agent's decisions. All the various corrections add on to the utility as the agent continues making decisions, but none of them make the agent change what it does.

The relevance of this will be explained in a subsequent post (unless someone finds an error here).

New Comment
2 comments, sorted by Click to highlight new comments since:

The ghost of Ilya Shpitser will glower at you for conditioning directly on bet, rather than on the counterfactual where you take the bet. But I'd like to see where you go before hammering on that point.

If I understand you, these are the expected values of the utility functions you outline.

Event       |   u  |   T  |u + T
no bet      |  $49 |   $0 | $49
bet         |  $50 |   $0 | $50
bet & heads |   $0 |  $50 | $50
bet & tails | $100 | -$50 | $50

Is that correct?

Yes, I was not looking at subtleties like that :-)

Your table is correct.

EDIT: your table is correct, but there are some extra unimportant subtleties - like the fact that under "bet", you don't have a u, only an expected u.