Thoughts on the process of writing this post:

- It took a lot of effort to write, something like 3 days of my time. Distillation is hard.
- Most of this effort was not in understanding the original post (took me 2-3 hours to understand the math)
- I sent drafts to johnwentworth several times and had several conversations with him to refine this piece. This probably spent ~2 hours of his time.
- I'm not satisfied with the final result. It seems like the point the original post made was fairly obvious and I used way too many words to explain it properly. Maybe John thought the interpretation of the math was fairly deep and I thought it wasn't very deep?
- I think that since John is a good and prolific writer already compared to most alignment researchers, there is higher value in distilling ideas of other researchers. It's hard to produce a lot of value from content already on LW.
- Paul Christiano blogposts are somewhat famously opaque; distillations of these have worked in the past and still seem pretty valuable. The highest-relevance academic papers might be better. But many of the highest-value distillations probably involve talking to researchers to get things they're too busy to write down at all.

This is a distillation ofthis postby John Wentworth.## Introduction

Suppose you're playing a poker game. You're an excellent poker player (though you've never studied probability), and your goal is to maximize your winnings.

Your opponent is about to raise, call, or fold, and you start thinking ahead.

Let's break down your thinking in the case where your opponent raises. Your thought process is something like this:

Step 2 is the important one here. Let's unpack it further.

This sounds suspiciously like you're maximizing the Bayesian

conditional expectationof your winnings: the expected value given some partial information about the world. This can be precisely defined as E[u(A,X)|opponent raises]=∑X s.t. opponent raisesP[X]u(A,X), where u is your winnings, A is your action, and P[X] is the probability of world X. But you don't know any probability, so you don't know how to assign probability to worlds, much less what conditioning and expectation are! How could you possibly be maximizing a "conditional expectation"?Luckily, your opponent folds and you win the hand. You resolve to (a) study coherence theorems and probability so you know the Law behind optimal poker strategy, and (b) figure out why you have a voice in your head telling you about "conditional expectations" and reading equations at you.

It turns out your behavior at the poker table can be derived from one particular property of your poker strategy: you

nevermake a decision that is worse than another possible decision in all possible worlds. (An economist would say you're beingPareto-efficientabout maximizing your winnings in different possible worlds).## Summary

An agent which has some goal, has uncertainty over which world it's in, and is Pareto-efficient in the amount of goal achieved in different possible worlds, can be modeled as using conditional probability. We show this result in two steps:

There's also a third, more speculative step:

manydistributed decisions based on different pieces of limited information, it's more efficient / simpler for the agent to "think about" different underlying worlds rather than just the received information, so it is behaving as if it applies conditional expected value within a world-model.This result is essentially a very weak

selection theorem.## Pareto efficiency over possible worlds implies EUM

Suppose that an agent is in some world X∈X and has uncertainty over which world it's in. The agent has a goal u and is Pareto-efficient with respect to maximizing the amount of goal achieved in each world. A

well-known result in economicssays that Pareto efficiency implies the existence of some function P[X] such that the agent chooses its actions A to maximize the weighted sum ∑XP[X]u(A,X). (Without loss of generality, we can let P sum to 1.) If we interpret P[X] as the probability of world X, the agent maximizes EX[u(A,X)], i.e. expected utility.Note that we have not determined anything about P other than that it sums to 1. Some properties we

don'tknow or derive in this setup:^{[1]}VNMThe following example assumes that we have an expected utility maximizer in the sense of being Pareto efficient over multiple worlds, and shows that it behaves as if it uses conditional probabilities.

## EUM implies conditional expected value

Another example, but we actually walk through the math this time.

You live in Berkeley, CA, like Korean food, and have utility function u = "subjective quality of food you eat". Suppose you are deciding where to eat based only on names and Yelp reviews of restaurants. You are uncertain about X, a random variable representing the quality of all restaurants under your preferences, and Yelp reviews give you partial information about this. Your decision-making is some function A(f(X)) of the information f(X) in the Yelp reviews, and you choose A to maximize your expected utility between worlds: maybe the optimal A is to compare the average star ratings, give Korean restaurants a 0.2 star bonus, and pick the restaurant with the best adjusted average rating.

Here, we assume you behave like an "expected utility maximizer" in the weak sense above. I claim we can model you as maximizing conditional expected value.

Suppose you're constructing a lookup table for the best action A given each possible observation of reviews. Your lookup table looks something like

You always calculate the action A that maximizes EXu(A,X)=∑XP[X]u(A(f(X)),X).

Suppose that in a given row we have f(X)=o, where o is some observation. Then we are finding argmaxA(o)EX[u(A(f(X)),X]=argmaxA(o)∑XP[X]u(A(f(X)),X). We can make a series of simplifications:

Thus, we can model you as using conditional expected value.

## Multiple decisions might imply conditional EV is meaningful

This section is a distillation of, and expansion upon,this comment thread.Suppose now that you're making multiple decisions A=(Ai)1≤i≤n in a distributed fashion to maximize the same utility function, where there is no information flow between the decisions. For example, 10 copies of you (with the same preferences and same choice of restaurants) are dropped into Berkeley, but they all have slightly different observation processes fi: Google Maps reviews, Grubhub reviews, personal anecdotes, etc.

Now, when constructing a lookup table for Ai, each copy of you will still condition each row's output on its input. When making decision Ai from input fi(X), you don't have the other information fj(X) for i≠j, so you consider each decision separately, still maximizing E[u(A,X)|fi(X)=oi]. Here, the information fi does not depend on other decisions, but this is not necessary for the core point.

^{[2]}In the setup with one decision, we showed that a Pareto-efficient agent can be modeled as maximizing conditional EU over possible worlds X: u′(A,o)=E[u(A,X)|f(X)=o]. But because one can construct a utility function of type observation→action consistent with

anyagent's behavior, the agent canalsobe modeled as maximizing conditional EU over possible observations o: u′(A,o)=E[u(A,X)|f(X)=o]. In the single-decision case, there is no compelling reason to model the agent as caring about worlds rather than observations, especially because storing and processing observations should be simpler than storing and processing distributions of worlds.When the agent makes multiple decisions based on different observations o1,…,on, there are two possible "trivial" ways to model it: either as maximizing a utility function u′(A,o1,o2,…,on), or as maximizing separate utility functions u′1(A1,o1),…,u′n(An,on). However, with sufficiently many decisions, neither of these trivial representations is as "nice" as conditional EU over possible worlds:

anyutility function u′(A,o) can be explained as maximizing E[u(A,X)|f(X)=f∗] for some u: perhaps if you always pick restaurants with the lowest star rating, you just like low-quality food. But this is not true in the multi-decision case: with enough decisions, not every tuple of utility functions u′1(A1,o1),…,u′n(An,on) corresponds to a utility function over worlds X.Suppose when given Grubhub ratings, an agent picks the

highest-rated restaurants, but when given Yelp ratings, it picks thelowest-rated restaurants. The agent is now being suspiciously inconsistent-- though maybe it values eating at restaurants that have good delivery food but terrible service, or something. With enough inconsistent-looking decisions, there could actually benoproperty of the restaurants that it is maximizing, and sonoutility function u(A,X) that explains its behavior.^{[3]}So in the multi-decision case, saying the agent is maximizing E[u(A,X)|f(X)=oi] actually narrows down its behavior.^{^}John made the following comment:

^{^}When f depends on past decisions, the agent just maximizes E[u(A,X)|fi(A<i,X)=oi]. To see the math for the multi-decision case, read the

original postby John Wentworth.^{^}If the world has bX bits of state, and the observations reveal bo bits of information each, the pigeonhole principle says this surely happens when there are bx/bo observations. Our universe has about 10125 bits of state, so this won't happen unless our agent can operate coherently in ~10125 different decisions; this number can maybe be reduced if we suppose that our agent can only actually observe, say, 1010 bits of state.