An explanation of decision theories

metachirality

[Epistemic status: I may have gotten some things wrong so please point out any errors. I also papered over a lot of technical details in favor of presenting my intuitions. A lot of this is me thinking out loud. If you already know a lot about decision theory or want the gory details, I've listed a bunch of resources on decision theory at the bottom of this post.]

Rationality is sometimes considered to be "maximizing expected utility". This seems pretty unambiguous but it turns out that what this means is pretty tricky to define, and different decision theories have different definitions of it. In this post, I will summarize causal decision theory, evidential decision theory, and updateless/functional/logical decision theory.

For a while, there were two dominant decision theories, causal decision theory (CDT), and evidential decision theory (EDT):

Causal decision theory: Take actions that physically cause you to get higher expected utility.
Evidential decision theory: Take actions that would be evidence of you getting higher expected utility.

To illustrate their differences and point out their problems, I'll introduce two scenarios:

Newcomb's problem

Omega is an entity who can predict your actions with 99% certainty. Importantly, Omega simulates you to predict your actions. They put $1,000 in a transparent box and $1,000,000 in an opaque box if and only if they predict that you would open only the opaque box to acquire the $1,000,000. Since picking only the transparent box doesn't maximize expected utility in either scenario, we are left with two options: picking only the opaque box (one-boxing) or picking both the transparent and opaque boxes (two-boxing). Do you one-box or two-box?

A CDT agent two-boxes because Omega already put in or didn't put in the money in the opaque box, and they can't physically cause the past.

An EDT agent one-boxes, because that would be evidence of them having received $1,000,000 instead of $1,000.

EDT wins in Newcomb's problem, so does that make it better than CDT on all fronts? Not quite.

Smoking lesion problem

Let's suppose that instead of smoking causing lung cancer, smoking and lung cancer are exclusive to a common cause: a genetic lesion that makes people smoke and 99% also gives them lung cancer. If you don't have the lesion, you have a mere 1% of getting lung cancer. You value smoking at $1,000 but value not having lung cancer at $1,000,000. Should you smoke?

Of course, if there really was a lesion like this, it wouldn't be causing you to smoke directly, but causing an urge to smoke which may or may not be acted upon, but in this scenario it directly causes you to smoke.

A CDT agent smokes since they already do or do not have the lesion, and because they can't cause the past, they might as well smoke.

An EDT agent doesn't make because that would be evidence of having the lesion which would be evidence of them having lung cancer.

CDT agents win here instead of EDT because smoking might be evidence of them having lung cancer, but even if they didn't smoke, they would still have as high of a chance of having lung cancer anyways.

There seems to be a sense in which you can "cause" Omega to put $1,000,000 in the box but where you can't "cause" yourself to have a smoking lesion or not. It's like you have control over Omega's simulation but not over whether you have the lesion or not.

Furthermore, a CDT or EDT agent would want to modify their own brain/code to one-box but also to smoke, provided that Omega hasn't started simulating its brain and it hasn't learned that they have the lesion or not, which suggests there is some important difference between Newcomb's problem and the smoking lesion problem. CDT, because modifying its brain would cause Omega to make a different prediction where not modifying wouldn't and EDT, because wanting to not smoke when you get the lesion would not be evidence for you not getting the lesion, at least before whether you have the lesion or not has been decided for sure. (This is another problem with CDT and EDT, if one of them really is the One True decision theory, there shouldn't be any superior decision theories it would want to self-modify to!)

This brings us to updateless/functional/logical decision theory, actually a class of similar decision theories.

Updateless decision theory

Updateless decision theory is a decision theory that acts like it can determine the output of whatever process/algorithm it uses to make decisions.

In Newcomb's problem, a UDT agent one-boxes because Omega simulates its brain, including the process it is using to decide whether to one-box or two-box. Specifically, it goes "Well whatever cognitive process I'm using to determine whether to one-box or two-box is being replicated by Omega, so whatever decision I make will be reflected in Omega's simulation, so I ought to one-box to make the simulation one-box, thereby winning me $1,000,000 while losing out on only $1,000." This seems weird and circular but it works because of something called Löb's theorem. It also implies that, in a sense, you can control the past. These weird correlations that don't involve any physical interaction, merely just instantiating the same algorithm or process at different times or places, are called logical correlations.

In the smoking lesion problem, a UDT agent smokes because there's no Omega predicting your actions and giving you a lesion based on that, it's purely a matter of chance with no logical correlation.

However, the weirdness of UDT only really shines when we look at scenarios where UDT gives a different result than both CDT *and* EDT.

Parfit's hitchhiker

You are trapped in the desert next to a long road that leads into the city. Omega drives by and says they'll drive you to the city, saving your life (which you have at $1,000,000), but only if they predict you'll give them $1,000 from an ATM while you're there, to compensate for the effort spent to drive you to the city. Do you give Omega $1,000 once you're in the city?

If a CDT agent was in the city, they would not give Omega $1,000 because they're already in the city, their action won't affect the past and make them *more* in the city. Omega predicts this would happen and leaves the CDT agent in the desert to die.

If an EDT agent was in the city, they would not give Omega $1,000 because they already know they're in the city, giving them $1,000 wouldn't give them any more evidence they're in the city. Similarly, Omega leaves the EDT agent to die.

If a UDT agent was in the city, they would give Omega $1,000 because they would reason that whatever decision they make would be reflected in Omega's simulation which would cause Omega to drive you into the city, saving you. Omega does this and the UDT agent is saved.

This scenario, equivalent to Newcomb's problem with both boxes transparent, is even weirder than the previous two scenarios. Even though the UDT agent already knows what happens in the past for sure, they give the $1,000 anyways because there are pretty much no other logically consistent alternatives. Furthermore, Omega is possibly making the decision based on what the agent does in an almost *logically impossible* scenario. Similarly to Newcomb's problem, both CDT and EDT agents also self-modify to give the $1,000 as long as they aren't in the city yet.

It gets weirder.

Counterfactual mugging

Omega comes up to you and flips a coin. If it lands on heads, Omega will give you $1,000,000 if and only if they predict you'll give them $1,000 when it lands on tails. It lands on tails. Do you give Omega $1,000?

CDT and EDT agents don't give $1,000. I'll leave the reason why as an exercise to the reader.

UDT agents do give the $1,000 because, before the coin is flipped, not giving $1,000 when the coin is tails eliminates the possibility of getting $1,000,000 if it lands on heads.

Even though the UDT agent never sees or will see the benefit of giving $1,000, they do so anyways. Yet again, CDT and EDT agents self-modify to give the $1,000 as long as the coin hasn't been flipped yet. Yet again, CDT and EDT self-modify to give the $1,000.

You might have noticed that CDT and EDT agents do self-modify to use something like UDT, even though they aren't UDT agents. This suggests another characterization of UDT: The decision theory you would self-modify to use before the beginning of time, not knowing anything about the world and before you actually get spawned into the world. This is also, I think, the origin of the name "updateless decision theory".

The kinds of updateless decision theory

Before, I mentioned that updateless decision theory was actually a class of decision theories, and not just one decision theory. Here I list the different kinds of updateless decision theory:

Timeless decision theory: Not actually an updateless decision theory but still a predecessor to updateless decision theory. One-boxes, smokes, and gives the $1,000 in Parfit's hitchhiker but not in the counterfactual mugging.
UDT1: Looks over and selects individual actions and their results rather than whole policies of actions for different scenarios and their results.
UDT1.1: Looks over and selects whole policies of actions for different scenarios and their results to select.
UDT2: Looks over and selects policies along with the algorithms used to calculate which decisions to make, accounting for implementation details such as time and memory used.

LESSWRONG
LW