Why you must maximize expected utility

13th Dec 2012

13Eugine_Nier

16drnickbone

15IlyaShpitser

3Posterity

21Oligopsony

2Eugine_Nier

0timtyler

1Eugine_Nier

-1alex_zag_al

4Qiaochu_Yuan

0wedrifid

2Qiaochu_Yuan

0MixedNuts

5Qiaochu_Yuan

0A1987dM

1Richard_Kennaway

5Benya

3Nick_Tarleton

1DSimon

0Decius

0Richard_Kennaway

4Manfred

-1Richard_Kennaway

4alex_zag_al

4Nominull

4[anonymous]

8Nominull

8lukeprog

4Nick_Tarleton

1A1987dM

0ygert

11Nominull

1ygert

2A1987dM

2Qiaochu_Yuan

2Benya

1Qiaochu_Yuan

0Benya

0Qiaochu_Yuan

2cousin_it

1maxfieldwallace

1Wei Dai

3Benya

1Wei Dai

0AlexMennen

1[anonymous]

0Benya

1Vaniver

0Decius

0alex_zag_al

0[anonymous]

7Qiaochu_Yuan

0[anonymous]

2Qiaochu_Yuan

-2[anonymous]

4Qiaochu_Yuan

4gjm

1Richard_Kennaway

-1alex_zag_al

0alex_zag_al

0[anonymous]

-2PhilGoetz

11[anonymous]

2Decius

2[anonymous]

-1Decius

2[anonymous]

0Decius

1[anonymous]

0Decius

0[anonymous]

4Decius

0[anonymous]

4Rob Bensinger

2V_V

2timtyler

New Comment

76 comments, sorted by Click to highlight new comments since: Today at 6:41 AM

Some comments are truncated due to high volume. (⌘F to expand all)

This type of argument strikes me as analogous to using Arrow's theorem to argue that we must implement a dictatorship.

But the post is an argument for using cardinal utility (VNM utility)! And Arrow's "impossibility" theorem only applies when trying to aggregate ordinal utilities across voters. It is well-known that voting systems which aggregate cardinal utility, such as Range Voting can escape the impossibility theorem.

So Arrow is actually *another* reason for having a VNM utility function: it allows collectively rational decisions, as well as individually rational decisions.

You know, given Arrow's result, and given the observation commonly made around here that there are lots of little agents running around in our head, it is not so surprising that human beings exhibit "incoherent behavior." It's a consequence of our mind architecture.

I am not sure I am prepared to start culling my internal Congress just so I can have a coherent utility function that makes the survivors happy.

311y

Analogous in what way?

As the Theorem treats them, voters are already utility-maximizing agents who have a clear preference set which they act on in rational ways. The question: how to aggregate these?

It turns out that if you want certain superficially reasonable things out of a voting process from such agents - nothing gets chosen at random, it doesn't matter how you cut up choices or whatever, &c. - you're in for disappointment. There isn't actually a way to have a group that is itself rationally agentic in the precise way the Theorem postulates.

One bullet you could bite is having a dictator. Then none of the inconsistencies arise from having all these extra preference sets lying around because there's only one and it's perfectly coherent. This is very easily comparable to reducing all of your own preferences into a single coherent utility function.

211y

Both involve taking a mathematical result about the only way to do something in a way that satisfies certain intuitively appealing properties, and using it to argue that we therefore should do it that way.

011y

A dictatorship isn't the only resolution to Arrow's theorem. Anyway, this sounds like a rather weak argument against the position.

111y

It's an outside view argument.

-111y

Not really, because the argument isn't that you should do anything differently at all. It says that there's some utility function that represents your preferences, some expected-utility-maximizing genie that makes the same choices as you, but it doesn't tell you to have different preferences, or make different decisions under any circumstances.
In fact, I don't really know why this post is called "Why you must maximize expected utility" instead of "Why you already maximize expected utility." It seems that even if I have some algorithm that is on the surface not maximizing expected utility, such as being risk-averse in some way dealing with money, then I'm really just maximizing the expected value of a non-obvious utility function.

411y

No. Most humans do not maximize expected utility with respect to any utility function whatsoever because they have preferences which violate the hypotheses of the VNM theorem. For example, framing effects show that humans do not even consistently have the same preferences regarding fixed probability distributions over outcomes (but that their preferences change depending on whether the outcomes are described in terms of gains or losses).
Edit: in other words, the VNM theorem shows that "you must maximize expected utility" is equivalent to "your preferences should satisfy the hypotheses of the VNM theorem" (and not all of these hypotheses are encapsulated in the VNM axioms), and this is a statement with nontrivial content.

011y

Axioms? (Hypotheses does seem to quite fit. One could have a hypothesis that humans had preferences that are in accord with the VNM axioms and falsify said theorem but the VNM doesn't make the hypothesis itself.)

211y

In the nomenclature that I think is relatively standard among mathematicians, if a theorem states "if P1, P2, ... then Q" then P1, P2, ... are the hypotheses of the theorem and Q is the conclusion. One of the hypotheses of the VNM theorem, which isn't strictly speaking one of the von Neumann-Morgenstern axioms, is that you assign consistent preferences at all (that is, that the decision of whether you prefer A to B depends only on what A and B are). I'm not using "consistent" here in the same sense as the Wikipedia article does when talking about transitivity; I mean consistent over time. (Edit: Eliezer uses "incoherent"; maybe that's a better word.)

011y

Premises.

511y

Again, among mathematicians, I think "hypotheses" is more common. Exhibit A; Exhibit B. I would guess that "premises" is more common among philosophers...?

011y

I usually say “assumptions”, but I'm neither a mathematician nor a philosopher. I do say “hypotheses” if for some reason I'm wearing mathematician attire.

111y

Not all decision algorithms are utility-maximising algorithms. If this were not so, the axioms of the VNM theorem would not be necessary. But they are necessary: the conclusion requires the axioms, and when axioms are dropped, decision algorithms violating the conclusion exist.
For example, suppose that given a choice between A and B it chooses A; between B and C it chooses B; between C and A it chooses C. No utility function describes this decision algorithm. Suppose that given a choice between A and B it never makes a choice. No utility function describes this decision algorithm.
Another way that a decision algorithm can fail to have an associated utility function is by lying outside the ontology of the VNM theorem. The VNM theorem treats only of decisions over probability distributions of outcomes. Decisions can be made over many other things. And what is an "outcome"? Can it be anything less than the complete state of the agent's entire positive light-cone? If not, it is practically impossible to calculate with; but if it can be smaller, what counts as an outcome and what does not?
Here is another decision algorithm. It is the one implemented by a room thermostat. It has two possible actions: turn the heating on, or turn the heating off. It has two sensors: one for the actual temperature and one for the set-point temperature. Its decisions are given by this algorithm: if the temperature falls 0.5 degrees below the set point, turn the heating on; if it rises 0.5 degrees above the set-point, turn the heating off. Exercise: what relationship holds between this system, the VNM theorem, and utility functions?

**Meditation:** So far, we've always pretended that you only face *one* choice, at *one* point in time. But not only is there a way to apply our theory to repeated interactions with the environment — there are two!

One way is to say that at each point in time, you should apply decision theory to set of actions you can perform *at that point*. Now, the actual outcome depends of course not only on what you do now, but also on what you do later; but you know that you'll still use decision theory later, so you can *foresee* what you will do in any possible future situation...

311y

"Apply decision theory to the set of actions you can perform at that point" is underspecified — are you computing counterfactuals the way CDT does, or EDT, TDT, etc?
This question sounds like a fuzzier way of asking which decision theory to use, but maybe I've missed the point.

111y

I really like this trend of adding meditations to posts, asking people to figure something out not just on their own but here and out loud.

011y

Does it matter if your utility function is constant with respect to time, provided that the most preferred outcome changes rarely?

011y

There is no distinction between these. How do you construct this hypothetical lookup table? By applying decision theory to every possible future history. In other words, by applying option 1 to calculate out everything in advance. But why bother? Applying option 1 as events unfold will produce results identical to applying it to all possible futures now, and avoids the small problem of requiring vastly more computational resources than the universe is capable of holding, running extraordinarily faster than anything is capable of happening, and operating for gigantically longer than the universe will exist, before you can do anything.

411y

Calculating the locally optimal action without any reference to plans can sometimes get you different results - see the absentminded driver problem.

-111y

I'm not convinced that the absentminded driver problem has such implications. Its straightforward (to me) resolution is that the optimal p is 2/3 by the obvious analysis, and that the driver cannot use alpha as a probability, for reasons set out here.
But I'd rather not get into a discussion of self-referential decision theory, since it doesn't currently exist.

It is essential to both of these paradoxes that they deal with social situations. Rephrase them so that the agent is interacting with nature, and the paradoxes disappear.

For example, suppose that the parent is instead collecting shells on the beach. He has room in his bag for one more shell, and finds two on the ground that he has no preference between. Clearly, there's no reason he would rather flip a coin to decide between them than just pick one of them up, say, the one on the left.

What this tells me is that you have to be careful using decision theory ...

People can't order outcomes from best to worst. People exhibit circular preferences. I, myself, exhibit circular preferences. This is a problem for a utility-function based theory of what I want.

4[anonymous]11y

Interesting. Example of circular preferences?

811y

There's a whole literature on preference intransitivity, but really, it's not that hard to catch yourself doing it. Just pay attention to your pairwise comparisons when you're choosing among three or more options, and don't let your mind cover up its dirty little secret.

811y

Yup. Possible cause: motivations are caused by at least 3 totally different kinds of processes which often conflict.

411y

Can you give an example of circular preferences that aren't contextual and therefore only superficially circular (like Benja's Alice and coin-flipping examples are contextual and only superficially irrational), and that you endorse, rather than regarding as bugs that should be resolved somehow? I'm pretty sure that any time I feel like I have intransitive preferences, it's because of things like framing effects or loss aversion that I would rather not be subject to.

111y

That does happen to me from time to time, but when it does (and I notice that) I just think “hey, I've found a bug in my mindware” and try to fix that. (Usually it's a result of some ugh field.)

011y

This would mean, of course, that humans can be money-pumped. In other words, if this is really true, there is a lot of money out there "on the table" for anyone to grab by simply money-pumping arbitrary humans. But in real life, if you went and tried to money-pump people, you would not get very far. But I accept a weaker form of what you are saying, that in the normal course of events when people are not consciously thinking about it we can exhibit circular reasoning. But in a situation where we actually are sitting down and thinking and calculating about it, we are capable of “resolving” those apparently circular preferences.

No, not "of course". It only implies that if they're rational actors, which of course they are not. They are deal-averse and if they see you trying to pump them around in a circle they will take their ball and go home.

You can still profit by doing one step of the money pump, and people do. Lots of research goes into exploiting people's circular preferences on things like supermarket displays.

111y

I think you are taking my point as something stronger than what I said. As you pointed out, with humans you can often money pump them once, but not more than that. So it can not truly be said that that preference is fully circular. It is something weaker, and perhaps you could call it a semi-circular preference. My point was that the thing that humans exhibit is not a “circular preference” in the fullest technical sense of the term.

211y

Well...

I do not understand the first part of the post. As far as I can tell, you are responding to concerns that have been raised elsewhere (possibly in your head while discussing the issue with yourself) but it is unclear to me what exactly these concerns are, so I'm lost. Specifically, I do not understand the following:

...Meditation: Alice is trying to decide how large a bonus each member of her team should get this year. She has just decided on giving Bob the same, already large, bonus as last year when she receives an e-mail from the head of a different divisi

211y

I am assuming that Alice, on reflection, decides that she wants to give Bob the higher bonus even if nobody else ever learned that she had the opportunity to recommend him for the project, the way I would not want to steal food from a starving person even if nobody ever found out about it.
The concern I'm replying to is that decision theory assumes your preferences can be described by a binary "is preferred to" relation, but humans might choose option A if the available options are A and B, and option B if the available options are A, B and C, so how do you model that as a binary relation? I actually don't recall seeing this raised in the context of VNM utility theory, but I believe I've seen it in discussions of Arrow's impossibility theorem, where the Independence of Irrelevant Alternatives axiom (confusingly, not the analog of VNM's Independence of Irrelevant Alternatives) says that adding option C must not change the decision from A to B.
I'm not particularly bothered for decision theory if you can do an experiment and have humans exhibit such behavior, because some human behavior is patently self-defeating and I don't think we should require decision theory to explain all our biases as "rational", but I want a decision theory that won't exclude the preferences that we would actually want to adopt on reflection, so I either want it to support Alice's preferences or I want to understand why Alice's preferences are in fact irrational.
It's like this: Caring about the set of options you were able to choose between seems like a bad idea to me; I'm skeptical that preferences like Alice's are what I would want to adopt, on reflection. I might be tempted to simply say, they're obviously irrational, no problem if decision theory doesn't cater to them. But caring about the algorithm your AI runs also seems like a bad idea, and by similar intuitions I might have been willing to accept a decision theory that would outlaw such preferences -- which, as it turns out, would

111y

Oh. I still do not think the example you gave illustrates this concern. One interpretation of the situation is that Alice gains new knowledge in the scenario. The existence of a new project suited to Bob's talents increases Alice's assessment of Bob's value. More generally, it's reasonable for an agent's preferences to change as its knowledge changes.
In response to this objection, I think you only need to assume that deciding between A and B and C is equivalent to deciding between A and (B and C) and also equivalent to deciding between (A and B) and C, together with the assumption that your agent is capable of consistently assigning preferences to "composite choices" like (A and B).
Are you claiming that these two situations are analogous or only claiming that they are two examples of caring about whether decision theory should allow certain kinds of preferences? That's one of the things I was confused about (because I can't see the analogy but your writing suggests that one exists). Also, where does your intuition that it is a bad idea to care about the algorithm your AI runs come from? It seems like an obviously good idea to care about the algorithm your AI runs to me.
I guess that depends on what "same" means. If you instantiate two AIs that are running identical algorithms but both AIs are explicitly trying to monopolize all of the resources on the planet, then they're playing a zero-sum game but there's a reasonable sense in which they are trying to steer the future in the "same" direction (namely that they are running identical algorithms).
If this isn't a reasonable notion of sameness because the algorithm involves reference to thisAgent and the referent of this pointer changes depending on who's instantiating the algorithm, then the preferences you've described are also not the same preferences because they also refer to thisAgent. If the preferences are modified to say "if an agent running thisAlgorithm has access to foo," then as far as I can tell the

011y

Thanks for the feedback!
It's possible that I'm just misreading your words to match my picture of the world, but it sounds to me as if we're not disagreeing too much, but I failed to get my point across in the post. Specifically:
I am saying that I think that a "direction for steering the future" should not depend on a global thisAgent variable. To make the earlier example even more blatant, I don't think it's useful to call "If thisAgent = Alice's AI, maximize paperclips; if thisAgent = Carol's AI, maximize staples" a coherent direction, I'd call it a function that returns a coherent direction. Whether or not the concept I'm trying to define is the best meaning for "same direction" is of course only a definitional debate and not that interesting, but I think it's a useful concept.
I agree that the most obvious formalization of Alice's preferences would depend on thisAgent. So I'm saying that there actually is a nontrivial restriction on her preferences: If she wants to keep something like her informal formulation, she will need to decide what they are supposed to mean in terms that do not refer to thisAgent. They may simply refer to "Alice", but then the AI is influenced only by what Alice was able to do, not by what the AI was able to do, and Alice will have to decide whether that is what she wants.
But how could you come up with a pair of situations such that in situation (i), the agent can choose options A and B, while in situation (ii), the agent can choose between A, B and C, and yet the agent has exactly the same information in situations (i) and (ii)? So under your rules, how could any example illustrate the concern?
I do agree that it's reasonable for Alice to choose a different option because the knowledge she has is different -- that's my resolution to the problem.
Sorry, I do not understand -- what do you mean by your composite choices? What does it mean to choose (A and B) when A and B are mutually exclusive options?
I'm claiming they are both ex

011y

Got it. I think.
In situation (i), Alice can choose between chocolate and vanilla ice cream. In situation (ii), Alice can choose between chocolate, vanilla, and strawberry ice cream. Having access to these options doesn't change Alice's knowledge about her preferences for ice cream flavors (under the assumption that access to flavors on a given day doesn't reflect some kind of global shortage of a flavor). In general it might help to have Alice's choices randomly determined, so that Alice's knowledge of her choices doesn't give her information about anything else.
Sorry, I should probably have used "or" instead of "and." If A and B are the primitive choices "chocolate ice cream" and "vanilla ice cream," then the composite choice (A or B) is "the opportunity to choose between chocolate and vanilla ice cream." The point is that once you allow a decision theory to assign preferences among composite choices, then composition of choices is associative, so preferences among an arbitrary number of primitive choices are determined by preferences among pairs of primitive choices.
Okay, but it still seems reasonable to have instrumental preferences about algorithms that AIs run, and I don't see why decision theory is not allowed to talk about instrumental preferences. (Admittedly I don't know very much about decision theory.)

Is the real-world imperative "you must maximize expected utility", given by the VNM theorem, stronger or weaker than the imperative "everyone must have the same beliefs" given by Aumann's agreement theorem? If only there was some way of comparing these things! One possible metric is how much money I'm losing by not following this or that imperative. Can anyone give an estimate?

[This comment is no longer endorsed by its author]

My local rationality group assigned this post as reading for our meetup this week, and it generated an interesting discussion.

I'm not an AI or decision theory expert. My only goal here is to argue that some of these claims are poor descriptions of actual human behavior. In particular, I don't think humans have consistent preferences about rare and negative events. I argue this by working backwards from the examples in the discussion on the Axiom of Continuity. I still think this post is valuable in other ways.

...Let's look at an example: If you prefer $50 i

It seems essential to the idea of "a coherent direction for steering the world" or "preferences" that the ordering between choices does not depend on what choices are actually available. But in standard cooperative multi-agent decision procedures, the ordering *does* depend on the set of choices available. How to make sense of this? Does it mean that a group of more than one agent can't be said to have a coherent direction for steering the world? What is it that they do have then? And if a human should be viewed as a group of sub-agents r...

311y

That's indeed my current intuition. Suppose that there is a paperclip maximizer and a staples maximizer, and the paperclip maximizer has sole control over all that happens in the universe, and the two have a common prior which assigns near-certainty to this being the case. Then I expect the universe to be filled with paperclips. But if Staples has control, I expect the universe to be tiled with staples.
On the other hand (stealing your example, but let's make it about a physical coinflip, to hopefully make it noncontroversial): If both priors assign 50% probability to "Clippy has control and the universe can support 10^10 paperclips or 10^20 staples" and 50% probability to "Staples has control and the universe can support 10^10 staples or 10^20 paperclips", and it turns out that in fact the first of these is true, then I expect Clippy to tile the universe with staples.
I disagree with Stuart's post arguing that this means that Nash's bargaining solution (NBS) can't be correct, because it is dynamically inconsistent, as it gives a different solution after Clippy updates on the information that it has sole control. I think this is simply a counterfactual mugging: Clippy's payoff in the possible world where Staples has control depends on Clippy's cooperation in the world where Clippy has control. The usual solution to counterfactual muggings is to simply optimize expected utility relative to your prior, so the obvious thing to do would be to apply NBS to your prior distribution, giving you dynamic consistency.
That said, I'm not saying that I'm sure NBS is in fact the right solution. My current intuition is that there should be some way to formalize the "bargaining power" of each agent, and when holding the bargaining powers fixed, a group of agents should be steering the world in a coherent direction. This suggests that the right formalization of "bargaining power" would give a nonnegative scaling factor to each member of the group, and the group will act to maximi

111y

Why so much emphasis on "responsibility"? In my mind, I have a responsibility to fulfill any promises I make to others and ... and that's about it. As for figuring out what my preferences are, or should be, I'm going to try any promising approaches I can find, and see if one of them works out. Thinking of myself as a bunch of sub-agents and using ideas from bargaining theory is one such an approach. Trying to solve normative ethics using the methods of moral philosophers may be another. When you say "see it as their responsibility to actually choose one direction in which they want to steer the world", what does that mean, in terms of an approach I can explore?
ETA: I wrote a post that may help explain what I meant here.

011y

There is a justification for that intuition. Some have objected to the axiom that the aggregation must also be VNM-rational, but Nisan has proved a similar theorem that does not rely on the VNM-rationality of the collective as an axiom.

The "not wanting the AI to run conscious simulations of people" link under the "Outcomes" heading does not work.

011y

Fixed, thanks!

What happens if your preferences do not satisfy Continuity? Say, you want to save human lives, but you're not willing to incur any probability, no matter how small, of infinitely many people getting tortured infinitely long for this?

Then you basically have a two-step optimization; "find me the set of actions that have a minimal number of infinitely many people getting tortured infinitely long, and then of that set, find me the set of actions that save a maximal number of human lives." The trouble with that is that people like to *express* their ...

New information is allowed to make a hypothesis more likely, but not predictably so; if all ways the experiment could come out make the hypothesis more likely, then you should already be finding it more likely than you do. The same thing is true even if only one result would make the hypothesis more likely, but the other would leave your probability estimate exactly unchanged.

One result might change my probability estimate by less than my current imprecision/uncertainty/rounding error in stating said estimate. If the coin comes up H,H,H,H,H,H,**T**,H,H,H,H,...

Thank you for this excellent post. I read this primarily because I would like to use formal theories to aid my own decision-making about big, abstract decisions like what to do with my life and what charities to donate to, where the numbers are much more available than the emotional responses. In a way this didn't help at all: it only says anything about a situation where you start with a preference ordering. But in another way it helped immensely, of course, since I need to understand these fundamental concepts. It was really valuable to me that you were so careful about what these "utilities" really mean.

Maximizing expected utility can be paradoxically shown to minimize actual utility, however. Consider a game in which you place an initial bet of $1 on a 6-sided die coming up anything but 1 (2-6), which pays even money if you win and costs you your bet if you lose. The twist, however, is that upon winning (i.e. you now have $2 in front of you) you must either bet the entire sum formed by your bet and its wins or leave the game permanently. Theoretically, since the odds are in your favor, you should always keep going. Always. But wait, this means you will e...

711y

You aren't analyzing this game correctly. At the beginning of the game, you're deciding between possible strategies for playing the game, and you should be evaluating the expected value of each of these strategies.
The strategy where you keep going until you lose has expected value -1. There is also a sequence of strategies depending on a positive integer n where you quit at the latest after the nth bet, and their expected values form an arithmetic progression. In other words, there isn't an optimal strategy for this game because there are infinitely many strategies and their expected values get arbitrarily high.
In addition, the sequence of strategies I described tends to the first strategy in the limit as n tends to infinity, in some sense, but their expected values don't respect this limit, which is what leads to the apparent paradox that you noted. In more mathematical language, what you're seeing here is a failure of the ability to exchange limits and integrals (where the integrals are expected values). Less mathematically, you can't evaluate the expected value of a sequence of infinitely many decisions by adding up the expected value of each individual decision. In practice, you will never be able to make infinitely many decisions, so this doesn't really matter.
This issue is closely related to the puzzle where the Devil gives you money and takes it away infinitely many times. I don't remember what it's called.

0[anonymous]11y

Indeed they don't, but the point is that while stopping at N+1 always dominates stopping at N, this thinking leads one to keep continuing and lose. As such, the only winning move is to do exactly NOT this and decide some arbitrary prior point to stop at (or decide indeterministically such as by coin flip). Attempting to maximize expected utility is the only strategy that won't work. This game, prisoners' dilemma, and newcomblike problems are all cases where choosing in such a way that does better (than the alternative) in all cases can still do worse overall.

211y

The point isn't that the strategy that is supposed to maximize expected utility is a bad idea. The point is that you're computing its expected utility incorrectly because you're switching a limit and an integral that you can't switch. This is a completely different issue from the prisoner's dilemma; it is entirely an issue of infinities and has nothing to do with the practical issue of being a decision-maker with bounded resources making finitely many decisions.

-2[anonymous]11y

It isn't a matter of switching a limit and an integral, or any means of infinity really. You could just consider the 1 number you're currently on, your options are to continue or stop. To come out of the game with any money, one must at some point say "forget maximizing expected utility, I'm not risking losing what I've acquired". By stopping, you lose expected utility compared to continuing exactly 1 more time. My point being that it is not always the case that "you must maximize expected utility", for in some cases it may be wrong or impossible to do so.

411y

All you've shown is that maximizing expected utility infinitely many times does not maximize the expected utility you get at the end of the infinitely many decisions you've made. This is entirely a matter of switching a limit and an integral, and it is irrelevant to practical decision-making.

411y

1 This argument only works if the bet is denominated in utils rather than in dollars. Otherwise, someone who gets diminishing marginal utility from dollars for very large sums -- that would include most people -- will eventually decide to stop. (If I have utility = log(dollars) and initial assets of $1M then I will stop after 25 wins, if I did the calculations right.)
1a It is not at all clear that a bet denominated in utils is even actually possible. Especially not one which, with high probability, ends up involving an astronomically large quantity of utility.
2 Even someone who doesn't generally get diminishing marginal utility from dollars -- say, an altruist who will use all those dollars for saving other people's lives, and who cares equally about all -- will find marginal utility decreasing for large enough sums, because (a) eventually the cheap problems are solved and saving the next life starts costing more, and (b) if you give me 10^15 dollars and I try to spend it all (on myself or others) then the resulting inflation will make them worth less.
3 Given that "you will eventually lose it all", a strategy of continuing to bet does not in fact maximize expected utility.
4 The expected utility from a given choice at a given stage in the game depends on what you'd then do with the remainder of the game. For instance, if I know that my future strategy after winning this roll is going to be "keep betting for ever" then I know that my expected utility if I keep playing is zero, so I'll choose not to do that.
5 So at most what we have (even if we assume we've dealt somehow with issues of diminishing marginal utility etc.) is a game where there's an infinite "increasing" sequence of strategies but no limiting strategy that's better than all of them. But that's no surprise. Here's another game with the same property: You name a positive integer N and Omega gives you $N. For any fixed N, it is best not to choose N because larger numbers are better. "Therefore" you

111y

This is the St. Petersburg paradox, discussed here from time to time.

-111y

It isn't really very much like the St. Petersburg paradox. The St. Petersburg game runs for a random length of time, you don't choose whether to continue; the only choice you make is at the beginning of the game where you decide how much to pay.
Or is it equivalent in some subtle way?

011y

Is it just me or is this essentially the same as the Lifespan Dilemma?
At the very least, in both cases, you find that you get high expected utilities by choosing very low probabilities of getting anything at all.
If your preferences can always be modelled with a utility function, does that mean that no matter how you make decisions, there's some adaptation of this paradox that will lead you to accept a near certainty of death?

0[anonymous]11y

It is essentially that, and it does show that trying to maximize expected utility can lead to such negative outcomes. Unfortunately, there doesn't seem to be a simple alternative to maximizing expected utility that doesn't lead to being a money pump. The kelly criterion is an excellent example of a decision-making strategy that doesn't maximize expected utility but still wins compared to it, so at least it's known that it can be done.

I appreciate the hard work here, but all the math sidesteps the real problems, which are in the axioms, particularly the axiom of independence. See this sequence of comments on my post arguing that saying expectation maximization is correct is equivalent to saying that average utilitarianism is correct.

People object to average utilitarianism because of certain "repugnant" scenarios, such as the utility monster (a single individual who enjoys torturing everyone else so much that it's right to let him or her do so). Some of these scenarios can be...

Suppose the world has one billion people. Do you think it's better to give one billion and one utilons to one person than to give one utilon to everyone?

Yes. If you think this conclusion is repugnant, you have not comprehended the meaning of 1000000001 times as much utility. The only thing that utility value even *means* is that you'd accept such a deal.

You don't "give" people utilons though. That implies scarcity, which implies some real resource to be distributed, which we correctly recognize as having diminishing returns on one person, and less diminishing returns on lots of people. The better way to think of it is that you *extract* utility from people.

Would you rather get 1e9 utils from one person, or 1 util from each of 1e9 people? Who cares 1e9 utils is 1e9 utils.

If so, why would you believe it's better to take an action that results in you having one billion and one utilons one-one-billionth of the time, and nothing all other times, than an action that reliably gives you one utilon?

Again, by construction, we take this deal.

VNM should not have called it "utility"; it drags in too many connotations. VNM utility is a very personal thing that describes what decisions *you* would make.

211y

It is permissible to prefer the outcome that has a constant probability distribution to the outcome that has the higher definite integral across the probability distribution.

2[anonymous]11y

What do you mean? Specifically, what is a "constant probability distribution"?
If you mean I can prefer $1M to a 1/1000 chance of $2B, then sure. Money is not utility.
On the other hand, I can't prefer 1M utils to 1/1000 chance of 2B utils.

-111y

A constant probability distribution is a flat distribution; i.e. a flat line.
And the outcomes can be ordered however one chooses. It is not necessary to provide additive numeric values.
Are you saying that utils are defined such that if one outcome is preferred over another, it has more expected utils?

2[anonymous]11y

Yes. That's exactly what I mean.
And I'm afraid I still don't know what you are getting at with this constant probability distribution thing.

011y

I mean an outcome where there is 1-epsilon chance of A.
It is permissible to assign utils arbitrarily, such that flipping a coin to decide between A and B has more utils than selecting A and more utils than selecting B. In that case, the outcome is "Flip a coin and allow the coin to decide", which has different utility from the sum of half of A and half of B.

1[anonymous]11y

Perhaps if you count "I flipped a coin and got A" > A.
You can always define some utility function such that it is rational to shoot yourself in the foot, but at that point, you are just doing a bunch of work to describe stupid behavior that you could just do anyways. You don't have to follow the VNM axioms either.
The point of VNM and such is to constrain your behavior. And if you input sensible things, it does. You don't have to let it constrain your behavior, but if you don't, it is doing no work for you.

011y

Right. If you think "I flipped a coin to decide" is more valuable than half of the difference between results of the coin flip (perhaps because those results are very close to equal, but you fear that systemic bias is a large negative, or perhaps because you demand that you are provably fair), then you flip a coin to decide.
The utility function, however, is not something to be defined. It is something to be determined and discovered- I already want things, and while what I want is time-variant, it isn't arbitrarily alterable.

0[anonymous]11y

Unless your utility assigns a positive utility to your utility function being altered, in which case you'd have to seek to optimize your meta-utility. Desire to change one's desires reflects an inconsistency, however, so one who desires to be consistent should desire not to desire to change one's desires. (my apologies if this sounds confusing)

411y

One level deeper: One who is not consistent but desires to be consistent desires to change their desires to desires that they will not then desire to change.
If you don't like not liking where you are, and you don't like where you are, move to somewhere where you will like where you are.

0[anonymous]11y

Ah, so true. Ultimately, I think that's exactly the point this article tries to make: if you don't want to do A, but you don't want to be the kind of person who doesn't want to do A (or you don't want to be the kind of person who doesn't do A), do A. If that doesn't work, change who you are.

411y

One possible response is that the former action is preferable, but the intuition pump yields a different result because our intuitions are informed by actual small and large rewards (e.g., money), and in the real world getting $1 every day for eight years with certainty does not have the same utility as getting $2922 with probability 1/2922 each day for the next eight years. If real-world examples like money -- which is almost always more valuable now than later, inflation aside; and which bears hidden and nonlinearly changing utilities like 'security' and 'versatility' and 'social status' and 'peace of mind' that we learn to reason with intuitively as though they could not be quantified in a single utility metric analogous to the currency measure itself -- are the only intuitive grasp we have on 'utilons,' then we may make systematic errors in trying to cash out how our values would, if we better understood our biases, be reflectively cashed out.

211y

von Neumann-Morgenstern decision theory only deals with instantaneous decision making.

211y

That thesis seems obviously wrong: the term "utilitarianism" refers not to maximising, but to maximising something pretty specific - namely: the happiness of all people.

This post explains von Neumann-Morgenstern (VNM) axioms for decision theory,and what follows from them: that if you have a consistent direction in which you are trying to steer the future, you must be an expected utility maximizer. I'm writing this post in preparation for a sequence on updateless anthropics, but I'm hoping that it will also be independently useful.The theorems of decision theory say that if you follow certain axioms, then your behavior is described by a utility function. (If you don't know what that means, I'll explain below.) So you should have a utility function! Except, why should you want to follow these axioms in the first place?

A couple of years ago, Eliezer explained how violating one of them can turn you into a money pump — how, at time 11:59, you will

wantto pay a penny to get option B instead of option A, and then at 12:01, you willwantto pay a penny to switch back. Either that, or the game will have ended and the option won't have made a difference.When I read that post, I was suitably impressed, but not completely convinced: I would certainly not want to behave one way if behaving differently

alwaysgave better results. But couldn't you avoid the problem by violating the axiom only in situations where it doesn't give anyone an opportunity to money-pump you? I'm not saying that would beelegant, but is there a reason it would beirrational?It took me a while, but I have since come around to the view that you really must have a utility function, and really must behave in a way that maximizes the expectation of this function, on pain of stupidity (or at least that there are strong arguments in this direction). But I don't know any source that comes close to explaining the reason, the way I see it; hence, this post.

I'll use the von Neumann-Morgenstern axioms, which assume probability theory as a foundation (unlike the Savage axioms, which actually

implythat anyone following them has not only a utility function but also a probability distribution). I will assume that you already accept Bayesianism.*

Epistemicrationality is about figuring out what's true;instrumentalrationality is about steering the future where you want it to go. The way I see it, the axioms of decision theory tell you how to have a consistentdirectionin which you are trying to steer the future. If my choice at 12:01 depends on whether at 11:59 I had a chance to decide differently, then perhaps I won't ever be money-pumped; but if I want to save as many human lives as possible, and I must decide between different plans that have different probabilities of saving different numbers of people, then it starts to at least seemdoubtfulthat which plan is better at 12:01 couldgenuinelydepend on my opportunity to choose at 11:59.So how do we formalize the notion of a coherent direction in which you can steer the future?

*

## Setting the stage

Decision theory asks what you would do if faced with choices between different sets of options, and then places restrictions on how you can act in one situation, depending on how you would act in others. This is another thing that has always bothered me: If we are talking about choices between different lotteries with small prizes, it makes some sense that we could invite you to the lab and run ten sessions with different choices, and you should probably act consistently across them. But if we're interested in the big questions, like how to save the world, then you're not going to face a series of independent, analogous scenarios. So what is the

contentof asking what you would do if you faced a set of choices different from the one you actually face?The real point is that you have bounded computational resources, and you can't

actuallyvisualize the exact set of choices you might face in the future. A perfect Bayesian rationalist could just figure out what theywoulddo in any conceivable situation and write it down in a giant lookup table, which means that they only face a single one-time choice between different possible tables. Butyoucan't do that, and so you need to figure out general principles to follow. A perfect Bayesian is like a Carnot engine — it's what a theoretically perfect enginewouldlook like, so even though you can at best approximate it, it still has something to teach you about how to build a real engine.But decision theory is

aboutwhat a perfect Bayesian would do, and it's annoying to have our practical concerns intrude into our ideal picture like that. So let's give our story some local color and say thatyouaren't a perfect Bayesian, but you have a genie — that is, a powerful optimization process — that is, an AI, whichis. (That, too, is physically impossible: AIs, like humans, can only approximate perfect Bayesianism. But wearestill idealizing.) Yourgenieis able to comprehend the set of possible giant lookup tables it must choose between;youmust write down a formula, to be evaluated by the genie, that chooses the best table from this set, given the available information. (An unmodified human won'tactuallybe able to write down an exact formula describing their preferences, but we might be able to write down one for a paperclip maximizer.)The first constraint decision theory places on your formula is that it must order all options your genie

mighthave to choose between from best to worst (though you might be indifferent between some of them), and then given any particular set of feasible options, it must choose the one that is least bad. In particular, if you prefer option A when options A and B are available, then you can't prefer option B when options A, B and C are available.Meditation:Alice is trying to decide how large a bonus each member of her team should get this year. She has just decided on giving Bob the same, already large, bonus as last year when she receives an e-mail from the head of a different division, asking her if she can recommend anyone for a new project he is setting up. Alice immediately realizes that Bob would love to be on that project, and would fit the bill exactly. But sheneedsBob on the contract he's currently working on; losing him would be a pretty bad blow for her team.Alice decides there is no way that she can recommend Bob for the new project. But she still feels bad about it, and she decides to make up for it by giving Bob a larger bonus. On reflection, she finds that she genuinely feels that this is therightthing to do, simply because shecouldhave recommended him but didn't. Does that mean that Alice's preferences are irrational? Or that something is wrong with decision theory?

Meditation:One kind of answer to the above and to many other criticisms of decision theory goes like this: Alice's decision isn't between giving Bob a larger bonus or not, it's between (give Bob a larger bonus unconditionally), (give Bob the same bonus unconditionally), (only give Bob a larger bonus if I could have recommended him), and so on. But ifthatsort of thing is allowed, is thereanyway left in which decision theory constrains Alice's behavior? If not, what good is it to Alice in figuring out what she should do?...

...

...

*

## Outcomes

My short answer is that Alice can care about anything she damn well likes. But there are a lot of things that she

doesn'tcare about, and decision theory has something to say aboutthose.In fact, deciding that some kinds of preferences should be outlawed as irrational can be dangerous: you might think that nobody in their right mind should ever care about the detailed planning algorithms their AI uses, as long as they work. But how certain are you that it's wrong to care about whether the AI has planned out your whole life in advance, in detail? (Worse: Depending on how strictly you interpret it, this injunction might even rule out not wanting the AI to run conscious simulations of people.)

But nevertheless, I believe the "anything she damn well likes" needs to be qualified. Imagine that Alice and Carol both have an AI, and fortuitously, both AIs have been programmed with the same preferences and the same Bayesian prior (and they talk, so they also have the same posterior, because Bayesians cannot agree to disagree). But Alice's AI has taken over the stock markets, while Carol's AI has seized the world's nuclear arsenals (and is protecting them well). So Alice's AI not only doesn't want to blow up Earth, it couldn't do so

even if it wanted to; it couldn't even bribe Carol's AI, because Carol's AI really doesn't want the Earth blown up either. And so, if it makes a difference to the AIs' preference function whether theycouldblow up Earth if they wanted to, they have a conflict of interest.The moral of this story is not simply that it would be

sadif two AIs came into conflict even though they have the same preferences. The point is that we're asking what it means to have a consistent direction in which you are trying to steer the future, and it doesn't look like our AIs are on the same bearing. Surely, a direction for steering the world should only depend on features of theworld, not on additional information about which agent is at the rudder.You

canwant to not have your life planned out by an AI. But I think you should have to state your wish as a property of the world: you wantallAIs to refrain from doing so, not just "whatever AI happens to be executing this". And Alice can want Bob to get a larger bonus if the company could have assigned him to the new project and decided not to, but she must figure out whetherthisis the correct way to translate her moral intuitions into preferences over properties of the world.*

You may care about any feature of the world, but you don't in fact care about most of them. For example, there are many ways the atoms in the sun could be arranged that all add up to the same thing as far as you are concerned, and you don't have

terminalpreferences about which of these will be the actual one tomorrow. And though you might care aboutsomeproperties of the algorithms your AI is running, mostly theyreallydo not matter.Let's define a function that takes a complete description of the world — past, present and future — and returns a data structure containing all information about the world that matters to your terminal values, and

onlythat information. (Our imaginary perfect Bayesian doesn't know exactly which way the world will turn out, but it can work with "possible worlds", complete descriptions of ways the worldmayturn out.) We'll call this data structure an "outcome", and we require you to be indifferent between any two courses of action that will always produce the same outcome. Of course, any course of action is something that your AI would be executing in the actual world, and you are certainly allowed to care about the difference — but then the two courses of action do not lead to the same "outcome"!^{1}With this definition, I think it is pretty reasonable to say that in order to have a consistent direction in which you want to steer the world, you must be able to order these outcomes from best to worst, and always want to pick the least bad you can get.

*

## Preference relations

That won't be

sufficient, though. Our genie doesn'tknowwhat outcome each action will produce, it only has probabilistic information about that, and that's a complication we very much donotwant to idealize away (because we're trying to figure out the right way todealwith it). And so our decision theory amends the earlier requirement: You must not only be indifferent between actions that always produce the same outcome, but also between all actions that only yieldthe same probability distributionover outcomes.This is not at all a mild assumption, though it's usually built so deeply into the definitions that it's not even called an "axiom". But we've assumed that all features of the world you care about are already encoded in the outcomes, so it does seem to me that the only reason left why you might prefer one action over another is that it gives you a better trade-off in terms of what outcomes it makes more or less likely; and I've assumed that you're already a Bayesian, so you agree that

howlikely it makes an outcome is correctly represented by the probability of that outcome, given the action. So it certainlyseemsthat the probability distribution over outcomes should give you all the information about an action that you couldpossiblycare about. And that you should be able to order these probability distributions from best to worst, and all that.Formally, we represent a direction for steering the world as a set of possible outcomes and a binary relation on the probability distributions over (with is interpreted as " is at least as good as ") which is a total preorder; that is, for all , and :

transitive); andtotal).In this post, I'll assume that is finite. We write (for "I'm indifferent between and ") when both and , and we write (" is strictly better than ") when but

not. Our genie will compute the set of all actions it could possibly take, and the probability distribution over possible outcomes that (according to the genie's Bayesian posterior) each of these actions leads to, and then it will choose to act in a way that maximizes . I'll also assume that the set of possible actions will always be finite, so there is always at least one optimal action.Meditation:Omega is in the neighbourhood and invites you to participate in one of its little games. Next Saturday, it plans to flip a fair coin; would you please indicate on the attached form whether you would like to bet that this coin will fall heads, or tails? If you correctly bet heads, you will win $10,000; if you correctly bet tails, you'll win $100. If you bet wrongly, you will still receive $1 for your participation.We'll assume that you prefer a 50% chance of $10,000 and a 50% chance of $1 to a 50% chance of $100 and a 50% chance of $1. Thus, our theory would say that you should bet heads. But there is a twist: Given recent galactopolitical events, you estimate a 3% chance that after posting its letter, Omega has been called away on urgent business. In this case, the game will be cancelled and you won't get any money, though as a consolation, Omega will probably send you some book from its rare SF collection when it returns (market value: approximately $55–$70). Our theory so far tells you nothing about how you should bet in this case, but does Rationality have anything to say about it?

...

...

...

*

## The Axiom of Independence

So here's how I think about that problem: If you already

knewthat Omega is still in the neighbourhood (but not which way the coin is going to fall), you would prefer to bet heads, and if youknewit has been called away, you wouldn't care. (And what you bet has no influence on whether Omega has been called away.) So heads is either better or exactly the same; clearly, you should bet heads.This type of reasoning is the content of the von Neumann-Morgenstern

Axiom of Independence. Apparently, that's the most controversial of the theory's axioms.You're already a Bayesian, so you already accept that if you perform an experiment to determine whether someone is a witch, and the experiment can come out two ways, then if one of these outcomes is evidence that the person is a witch, the other outcome must be evidence that they are

not. New information is allowed to make a hypothesis more likely, but notpredictablyso; ifallways the experiment could come out make the hypothesis more likely, then you shouldalreadybe finding it more likely than you do. The same thing is true even if only one result would make the hypothesis more likely, but the other would leave your probability estimate exactly unchanged.The Axiom of Independence is equivalent to saying that if you're evaluating a possible course of action, and one experimental result would make it seem more attractive than it currently seems to you, while the other experimental result would at least make it seem no

lessattractive, then you shouldalreadybe finding it more attractive than you do. This doesseemrather solid to me.*

So what does this axiom say formally?

(Feel free to skip this section if you don't care.)Suppose that your genie is considering two possible actions and (bet heads or tails), and an event (Omega is called away). Each action gives rise to a probability distribution over possible outcomes: E.g., is the probability of outcome if your genie chooses . But your genie can also compute a probability distribution

conditional on, . Suppose that conditional on , it doesn't matter which action you pick: for all . And finally, suppose that the probability of doesn't depend on which action you pick: , with . The Axiom of Independence says that in this situation, you should prefer the distribution to the distribution , and therefore prefer to , if and only if you prefer the distribution to the distribution .Let's write for the distribution , for the distribution , and for the distribution . (Formally, we think of these as vectors in : e.g., .) For all , we have

so , and similarly . Thus, we can state the Axiom of Independence as follows:

We'll assume that you can't ever rule out the possibility that your AI might face this type of situation for any given , , , and , so we require that this condition hold for all probability distributions ,and , and for all with .

*

Here's a common criticism of Independence. Suppose a parent has two children, and one old car that they can give to one of these children. Can't they be indifferent between giving the car to their older child or their younger child, but strictly prefer throwing a coin? But let mean that the younger child gets the gift, and that the older child gets it, and ; then by Independence, if , then , so it would seem that the parent can

notstrictly prefer the coin throw.In fairness, the people who find this criticism persuasive may not be Bayesians. But if

youthink this is a good criticism: Do you think that the parent must be indifferent between throwing a coin and asking the children's crazy old kindergarten teacher which of them was better-behaved, as long as they assign 50% probability to either answer? Because if not, shouldn't you already have protested when we decided that decisions must only depend on the probabilities of different outcomes?My own resolution is that this is another case of terminal values intruding where they don't belong.

Allthat is relevant to the parent's terminal values mustalreadybe described in the outcome; the parent is allowed to prefer "I threw a coin and my younger child got the car" to "I decided that my younger child would get the car" or "I asked the kindergarten teacher and they thought my younger child was better-behaved", but if so, then these must already be differentoutcomes. The thing to remember is that it isn't a property of theworldthat either child had a 50% probability of getting the car, and you can't steer the future in the direction of having this mythical property. Itisa property of the world thatthe parent assigned a 50% probabilityto each child getting the car, and thatisa direction you can steer in — though the example with the kindergarten teacher shows that this is probably not quite the direction you actually wanted.The preference relation is

onlysupposed to be abouttrade-offsbetween probability distributions; if you're tempted to say that you want to steer the world towards one probability distribution or another, rather than one outcome or other, something has gone terribly wrong.*

## The Axiom of Continuity

And… that's it. These are all the axioms that I'll ask you to accept in this post.

There is, however, one more axiom in the von Neumann-Morgenstern theory, the Axiom of Continuity. I do

notthink this axiom is a necessary requirement on any coherent plan for steering the world; I think the best argument for it is that it doesn't make a practical difference whether you adopt it, so you might as well. But there is also a good argument to be made that if we're talking about anythingshortof steering the entire future of humanity, your preferencesdoin fact obey this axiom, and it makes things easier technically if we adopt it, so I'll do that at least for now.Let's look at an example: If you prefer $50 in your pocket to $40, the axiom says that there must be

somesmall such that you prefer a probability of of $50 and a probability of of dying today to a certainty of $40. Some critics seem to see this as the ultimatereductio ad absurdumfor the VNM theory; they seem to think that no sane human would accept that deal.Eliezer was surely not the first to observe that this preference is exhibited each time someone drives an extra mile to save $10.

Continuity says that if you strictly prefer to , then there is

noso terrible that you wouldn't be willing to incur a small probability of it in order to (probably) get rather than , andnoso wonderful that you'd be willing to (probably) get instead of if this gives you some arbitrarily small probability of getting . Formally, for all , and ,I think if we're talking about everyday life, we can pretty much rule out that there are things so terrible that for

arbitrarilysmall , you'd be willing to die with probability to avoid a probability of of the terrible thing. And if you feel that it's not worth the expense to call a doctor every time you sneeze, you're willing to incur aslightlyhigher probability of death in order to save some mere money. And it seems unlikely that there isnoat which you'd prefer a certainty of $1 to a chance of $100. And if you have some preference that is so slight that you wouldn't be willing to acceptanychance of losing $1 in order to indulge it, it can't be a very strong preference. So I think for most practical purposes, we might as well accept Continuity.*

## The VNM theorem

If your preferences are described by a transitive and complete relation on the probability distributions over some set of "outcomes", and this relation satisfies Independence and Continuity, then you have a utility function, and your genie will be maximizing expected utility.

Here's what that means. A utility function is a function which assigns a numerical "utility" to every outcome. Given a probability distribution over , we can compute the expected value of under , ; this is called the

expected utility. We can prove that there is some utility function such that for all and , we have if and only if the expected utility under is greater than the expected utility under .In other words: is

completelydescribed by ; if you know , you know . Instead of programming your genie with a function that takes two outcomes and says which one is better, you might as well program it with a function that takes one outcome and returns its utility. Any coherent direction for steering the world which happens to satisfy Continuity can be reduced to a function that takes outcomes and assigns them numerical ratings.In fact, it turns out that the for a given is "almost" unique: Given two utility functions and that describe the same , there are numbers and such that for all , ; this is called an "affine transformation". On the other hand, it's not hard to see that for any such and ,

so two utility functions represent the same preference relation if and only if they are related in this way.

*

You shouldn't read

toomuch into this conception of utility. For example, it doesn't make sense to see a fundamental distinction between outcomes with "positive" and with "negative" von Neumann-Morgenstern utility — because adding the right can make any negative utility positive and any positive utility negative, without changing the underlying preference relation. The numbers that have real meaning are ratios between differences between utilities, , because these don't change under affine transformations (the 's cancel when you take the difference, and the 's cancel when you take the ratio). Academian's post has more about misunderstandings of VNM utility.In my view, what VNM utilities represent is not

necessarilyhowgoodeach outcome is; what they represent is what trade-offs between probability distributions you are willing to accept. Now, if you strongly felt that the difference between and was about the same as the difference between and , then you should havea very good reasonbefore you make your a huge number. But on the other hand, I think it's ultimately your responsibility to decide what trade-offs you are willing to make; I don't think you can get away with "stating how much you value different outcomes" and outsourcing the rest of the job to decision theory, without everconsideringwhat these valuations should mean in terms of probabilistic trade-offs.*

## Doing without Continuity

What happens if your preferences do

notsatisfy Continuity? Say, you want to save human lives, but you're not willing to incuranyprobability, no matter how small, ofinfinitelymany people getting tortured infinitely long for this?I do not see a good argument that this couldn't add up to a coherent direction for steering the world. I do, however, see an argument that in this case you care so little about finite numbers of human lives that in practice, you can probably neglect this concern entirely. (As a result, I doubt that your reflective equilibrium would want to adopt such preferences. But I don't think they're

incoherent.)I'll assume that your morality can still distinguish only a finite number of outcomes, and you can choose only between a finite number of decisions. It's not obvious that these assumptions are justified if we want to take into account the

possibilitythat the true laws of physics might turn out to allow for infinite computations, but even in this caseyouand any AIyoubuild will probably still be finite (thoughitmight build a successor that isn't), so I do in fact think there is a good chance that results derived under this assumption have relevance in the real world.In this case, it turns out that you

stillhave a utility function, in a certain sense. (Proofs for non-standard results can be found in the math appendix to this post. I did the work myself, but I don't expect these results to be new.) This utility function describes only the concern most important to you: in our example, only the probability of infinite torture makes a difference to expected utility; any change in the probability of saving a finite number of lives leaves expected utility unchanged.Let's define a relation , read " is

much betterthan ", which says that there is nothing you wouldn't give up a little probability of in order to get instead of — in our example: doesn't merely save lives compared to , it makes infinite torture less likely. Formally, we define to mean that for all and "close enough" to and respectively; more precisely: if there is an such that for all and with(Or equivalently: if there are open sets and around and , respectively, such that for all and .)

It turns out that if is a preference relation satisfying Independence, then is a preference relation satisfying Independence and Continuity, and there is a utility function such that iff the expected utility under is larger than the expected utility under . Obviously, implies , so whenever two options have different expected utilities, you prefer the one with the larger expected utility. Your genie is

stillan expected utility maximizer.Furthermore, unless for

alland , isn't constant — that is, there aresomeand with . (If this weren't the case, the result above obviously wouldn't tell us very much about !) Being indifferent between all possible actions doesn't make for a particularly interesting direction for steering the world, if it can be called one at all, so from now on let's assume that you are not.*

It

canhappen that there are two distributions and with the same expected utility, but . ( saves more lives, but the probability of eternal torture is the same.) Thus, if your genie happens to face a choice between two actions that lead to thesameexpected utility, it must do more work to figure out which of the actions it should take. But there is some reason to expect that such situations should berare.If there are possible outcomes, then the set of probability distributions over is -dimensional (because the probabilities must add up to 1, so if you know of them, you can figure out the last one). For example, if there are three outcomes, is a triangle, and if there are four outcomes, it's a tetrahedron. On the other hand, it turns out that for any , the set of all for which the expected utility equals has dimension or smaller: if , it's a line (or a point or the empty set); if , it's a plane (or a line or a point or the empty set).

Thus, in order to have the same expected utility, and must lie on the same hyperplane — not just on a plane

very close by, but onexactlythe same plane. That's not just a small target to hit, that's an infinitely small target. If you use, say, a Solomonoff prior, then it seemsveryunlikely that two of your finitely many options justhappento lead to probability distributions which yield the same expected utility.But we are bounded rationalists, not perfect Bayesians with uncomputable Solomonoff priors. We assign heads and tails exactly the same probability, not because there is no information that would make one or the other more likely (we could try to arrive at a best guess about which side is a little heavier than the other?), but because the problem is so complicated that we simply give up on it. What if it turns out that because of this, all the

difficultdecisions we need to make turn out to be between actions that happen to have the same expected utility?If you do your imperfect calculation and find that two of your options seem to yield exactly the same probability of eternal hell for infinitely many people, you

couldthen try to figure out which of them is more likely to save a finite number of lives. But it seems to me that this isnotthe best approximation of an ideal Bayesian with your stated preferences. Shouldn't you spend those computational resources on doing abettercalculation of which option is more likely to lead to eternal hell?For you

mightarrive at a new estimate under which the probabilities of hell are at least slightly different. Even if yoususpectthat the new calculation will again come out with the probabilities exactly equal, you don'tknowthat. And therefore, can you truly in good conscience argue that doing the new calculation does not improve the odds of avoiding hell ——

at least a teeny tiny incredibly super-small for all ordinary intents and purposes completely irrelevant bit?Even if it

shouldbe the case that to aperfectBayesian, the expected utilities under a Solomonoff prior were exactly the same,youdon't know that, so how can you possibly justify stopping the calculation and saving a mere finite number of lives?*

So there you have it. In order to have a coherent direction in which you want to steer the world, you must have a set of outcomes and a preference relation over the probability distributions over these outcomes, and this relation must satisfy Independence — or so it seems to me, anyway. And if you do, then you have a utility function, and a perfect Bayesian maximizing your preferences will always maximize expected utility.

It

couldhappen that two options have exactly the same expected utility, and in this case the utility function doesn't tell you which of these is better, under your preferences; but as a bounded rationalist, you can neverknowthis, so if you have any computational resources left that you could spend on figuring out what your true preferences have to say, you should spend them on a better calculation of the expected utilities instead.Given this, we might as well just talk about , which satisfies Continuity as well as Independence, instead of ; and you might as well program your genie with your utility function, which only reflects , instead of with your true preferences.

(Note: I am not literally saying that you should not try to understand the whole topic better than this if you areactuallygoing to program a Friendly AI. This is still meant as a metaphor. Iam,however, saying that expected utility theory, even with boring old real numbers as utilities,is not to be discardedlightly.)*

## Next post: Dealing with time

So far, we've always pretended that you only face

onechoice, atonepoint in time. But not only is there a way to apply our theory to repeated interactions with the environment — there are two!One way is to say that at each point in time, you should apply decision theory to set of actions you can perform

at that point. Now, the actual outcome depends of course not only on what you do now, but also on what you do later; but you know that you'll still use decision theory later, so you canforeseewhat you will do in any possible future situation, and take it into account when computing what action you should choose now.The second way is to make a choice only once, not between the actions you can take at that point in time, but between complete

plans— giant lookup tables — which specify how youwillbehave in any situation you might possibly face. Thus, you simply do your expected utility calculationonce, and then stick with the plan you have decided on.Meditation:Which of these is therightthing to do, if you have a perfect Bayesian genie and you want steer the future in some particular direction?(Does it even make a difference which one you use?)» To the mathematical appendixNotes^{1 }The accounts of decision theory I've read use the term "outcome", or "consequence", but leave it mostly undefined; in a lottery, it's the prize you get at the end, but clearly nobody is saying decision theory shouldonlyapply to lotteries. I'm not changing its role in the mathematics, and I think my explanation of it is what the term alwayswantedto mean; I expect that other people have explained it in similar ways, though I'm not sure how similar precisely.