LESSWRONG
LW

The ghost of Ilya Shpitser will glower at you for conditioning directly on bet, rather than on the counterfactual where you take the bet. But I'd like to see where you go before hammering on that point.

If I understand you, these are the expected values of the utility functions you outline.

Event       |   u  |   T  |u + T
------------+------+------+-----
no bet      |  $49 |   $0 | $49
bet         |  $50 |   $0 | $50
bet & heads |   $0 |  $50 | $50
bet & tails | $100 | -$50 | $50

Is that correct?

[-]Stuart_Armstrong12y20

Yes, I was not looking at subtleties like that :-)

Your table is correct.

EDIT: your table is correct, but there are some extra unimportant subtleties - like the fact that under "bet", you don't have a u, only an expected u.

Moderation Log

An example: suffer not from probability, nor benefit from it

The preceding seems rather abstract, but here is the motivating example. It's a correction term T that adds or subtracts utility, as external evidence comes in (it's important that the evidence is external - the agent gets no correction from knowing what its own actions are/were). If the AI knows evidence e, and new (external) evidence f comes in, then its utility gets adjusted by T(e,f) which is defined as

T(e,f) = E(u | e) - E(u | e, f)

In other words, the agents utility gets adjusted by the difference between the new expected utility and the old - and hence the agent's expected utility is unchanged by new external evidence.

Consider for instance an agent with a utility u linear in money. It much choose between a bet that goes 50-50 on $0 (heads) or $100 (tails), versus a sure $49. It correctly choose the bet, having an expected utility of u=$50 - in other words, E(u, bet)=$50. But now imagine that the coin comes out heads. The utility u plunges to $0 (in other words E(u | bet, heads)=0). But the correction term cancels that out:

u(bet, heads) + T(bet, heads) = $0 + E(u | bet) - E(u |bet, heads) = $0 + $50 -$0 = $50.

A similar effect leaves utility unchanging if the coin is tails, cancelling the increase. In other words, adding the T correction term removes the impact of stochastic effects on utility.

But the agent will still make the same decisions. This is because before seeing evidence f, it cannot predict its impact on EU(u). In other words, summing over all possible evidences f:

E(u | e) = Σ p(f)E(u | e, f),

which is another way of phrasing "conservation of expected evidence". This implies that

E(T(e,-)) = Σ p(f)T(e,f)

= Σ p(f)((E(u | e) - E(u | e, f))

= E(u | e) - Σ p(f)E(u | e, f)

= 0,

and hence that adding the T term does not change the agent's decisions. All the various corrections add on to the utility as the agent continues making decisions, but none of them make the agent change what it does.

The relevance of this will be explained in a subsequent post (unless someone finds an error here).