Expected utility, unlosing agents, and Pascal's mugging

by Stuart_Armstrong 5 min read28th Jul 201454 comments


Still very much a work in progress

EDIT: model/existence proof of unlosing agents can be found here.

Why do we bother about utility functions on Less Wrong? Well, because of results of the New man and the Morning Star, which showed that, essentially, if you make decisions, you better use something equivalent to expected utility maximisation. If you don't, you lose. Lose what? It doesn't matter, money, resources, whatever: the point is that any other system can be exploited by other agents or the universe itself to force you into a pointless loss. A pointless loss being a lose that give you no benefit or possibility of benefit - it's really bad.

The justifications for the axioms of expected utility are, roughly:

  1. (Completeness) "If you don't decide, you'll probably lose pointlessly."
  2. (Transitivity) "If your choices form loops, people can make you lose pointlessly."
  3. (Continuity/Achimedean) This axiom (and acceptable weaker versions of it) is much more subtle that it seems; "No choice is infinity important" is what it seems to say, but " 'I could have been a contender' isn't good enough" is closer to what it does. Anyway, that's a discussion for another time.
  4. (Independence) "If your choice aren't independent, people can expect to make you lose pointlessly."


Equivalency is not identity

A lot of people believe a subtlety different version of the result:

  • If you don't have a utility function, you'll lose pointlessly.

This is wrong. The correct result is:

  • If you don't lose pointlessly, then your decisions are equivalent with having a utility function.

What's the difference? I'll illustrate with Eliezer's paraphrase of Omohundro:

If you would rather be in Oakland than San Francisco, and you would rather be in San Jose than Oakland, and you would rather be in San Francisco than San Jose, you're going to spend an awful lot of money on taxi rides.

If you believed the first bullet point, then you would decide which you preferred between the three cities (and by how much), this would be you "utility function", and you'd them implement it and drive to the top city. But you could just as well start off in San Francisco, drive to Oakland, then to San Jose, be tempted to drive to San Francisco again, realise that that's stupid, and stay put. No intransitivity. Or, even better, notice the cycle, and choose to stay put in the San Francisco.

Phrased that way, the alternative method seems ridiculous. Why should you let you choice of city be determined by the arbitrary accident of your starting point? But actually, the utility function approach is just as arbitrary. Humans are far from rational, and we're filled with cycles of intransitive preferences. We need to break these cycles by using something from outside of our preference ordering (because those are flawed). And the way we break these cycles can depend on considerations just as random and contingent as "where are we located right now" - our moods, the availability of different factors in the three cities, etc...


Unlosing agents

You could start with an agent that has a whole host of incomplete, intransitive and dependent preferences (such as those of a human) and program it with the meta rule "don't lose pointlessly". It would then proceed through life, paying attention to its past choices, and breaking intransitive cycles or dependencies as needed, every time it faced a decision which threatened to make it lose pointlessly.

This agent is every bit as good as an expected utility maximiser, in terms of avoiding loss. Indeed, the more and varied choices it faced, the more it would start to resemble an expected utility maximiser, and the more its preferences would resemble a utility function. Ultimately, it could become an expected utility maximiser, if it faced enough choices and decisions. In fact, an expected utility maximiser could be conceived of as simply an unlosing agent that had actually faced every single imaginable choice between every single possible lottery.

So we cannot say that an unlosing agent is better or worse than an expected utility maximiser. The difference between them has to be determined by practical considerations. I can see several relevant ones (there are certainly more):

  • Memory capacity and speed. An unlosing agent has to remember all its past decisions and may need to review them before making a new decision, while an expected utility maximiser could be much faster.
  • Predictability 1: the actions of an expected utility maximiser are more predictable, if the utility function is easy to understand.
  • Predictability 2: the actions of an unlosing agent are more predictable, if the utility function is hard to understand.
  • Predictability 3: over very long time scales, the expected utility maximiser is probably more predictable.
  • Graceful degradation: small changes to an unlosing agent should result in smaller differences in decisions that small changes to an expected utility maximiser.
  • Dealing with moral uncertainty: an unlosing agent is intrinsically better setup to deal with decision uncertainty, though its ultimate morality will be more contingent on circumstances than an expected utility maximiser.

If we wanted to formalise human preferences, we can either front load the effort (devise the perfect utility function) or back load it (setup a collection of flawed preferences for the agent to update). The more complicated the perfect utility function could be, the more the back loaded approach seems preferable.

In practice, I think the appeal of the expected utility maximiser is that it is more attractive to philosophers and mathematicians: it involves solving everything perfectly ahead of time, and then everything is implementation. I can see the unlosing agent being more attractive to an engineer, though.

The other objection could be that the unlosing agent would have different preferences in different universes. But this is no different from an expected utility maximiser! Unless the Real True morality rise from hell to a chorus of angels, any perfect utility function is going to depend on choice made by it designer for contingent or biased reasons. Even on a more formal level, the objection depends on a particular coarse graining of the universe. Let A and B be universes, A' and B' the same universes with a particular contingent fact change. Then an expected utility maximiser that prefers A to B would also prefer A' to B', while the unlosing agent could have it preferences reversed. But note that this depends on the differences being contingent, which is a human definition rather than an abstract true fact about universes A, B, A', and B'.

Another way in which an unlosing agent could seem less arbitrary is if it could adjust its values according to its expected future, not just its known past. Call that a forward-thinking unlosing agent. We'll see an example of this in the next section.



Unlosing agents can even provide some solutions to thorny decision theory problems. Take Pascal's mugging:

Now suppose someone comes to me and says, "Give me five dollars, or I'll use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people."

Let's assume that the odds you assign of the person telling the truth is greater than 1/3^^^^3. One thing that is clear is that if you faced that decision 3^^^^3 times, each decision independent from the others... then you should pay each time. When you aggregate independent decisions, it narrows your total variance, forcing you closer to an expected utility maximiser (see this post).

But if you were sure that you'd face it only a few thousand times, what then? Take a forward-thinking unlosing agent. If it expected that it would get Pascal mugged only a few thousand times, it could perfectly well reject all of them without hesitation (and derive all the advantages of this). If it expected that there was a significant risk of getting Pascal mugged over and over and over again, it would decide to accept.

In more traditional terms, this would be an expected utility maximiser with a utility that's unbounded in universe with high risk of 3^^^^3 or more Pascal's muggings, and a bounded ones in other universes.

Similar unlosing agent designs can work in some cases of infinite ethics, or with the torture and dust speck example. These arguments often share common features: the decision is clear and easy in some universes (eg many independent choices), but not in others. And it's then argued that expected utility arguments must push the decision from the clear and easy universes onto the others. But a forward-thinking unlosing agent is perfectly placed to break that link, and decide one way in the "clear and easy" universes, and otherwise in the "others".

If you allow agents that are not perfectly unlosing (which is reasonable, for a bounded agent) it could even move between decision modes depending on what universe its in. A certain cost (in pointless loss) for a certain benefit in flexibility.

Anyway, there is certainly more to be said about unlosing agents and similar ideas, but I'll stop here for the moment and ask people what they think.