Comparison of decision theories (with a focus on logical-counterfactual decision theories)

byriceissa4d16th Mar 20199 comments

55

Ω 10


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Introduction

Summary

This post is a comparison of various existing decision theories, with a focus on decision theories that use logical counterfactuals (a.k.a. the kind of decision theories most discussed on LessWrong). The post compares the decision theories along outermost iteration (action vs policy vs algorithm), updatelessness (updateless or updateful), and type of counterfactual used (causal, conditional, logical). It then explains the decision theories in more detail, in particular giving an expected utility formula for each. The post then gives examples of specific existing decision problems where the decision theories give different answers.

Value-added

There are some other comparisons of decision theories (see the “Other comparisons” section), but they either (1) don’t focus on logical-counterfactual decision theories; or (2) are outdated (written before the new functional/logical decision theory terminology came about).

To give a more personal motivation, after reading through a bunch of papers and posts about these decision theories, and feeling like I understood the basic ideas, I remained highly confused about basic things like “How is UDT different from FDT?”, “Why was TDT deprecated?”, and “If TDT performs worse than FDT, then what’s one decision problem where they give different outputs?” This post hopes to clarify these and other questions.

None of the decision theory material in this post is novel. I am still learning the basics myself, and I would appreciate any corrections (even about subtle/nitpicky stuff).

Audience

This post is intended for people who are similarly confused about the differences between TDT, UDT, FDT, and LDT. In terms of reader background assumed, it would be good to know the statements to some standard decision theory problems (Newcomb’s problem, smoking lesion, Parfit’s hitchhiker, transparent box Newcomb’s problem, counterfactual mugging (a.k.a. curious benefactor; see page 56, footnote 89)) and the “correct” answers to them, and having enough background in math to understand the expected utility formulas.

If you don’t have the background, I would recommend reading chapters 5 and 6 of Gary Drescher’s Good and Real (explains well the idea of subjunctive means–end relations), the FDT paper (explains well how FDT’s action selection variant works, and how FDT differs from CDT and EDT), “Cheating Death in Damascus”, and “Toward Idealized Decision Theory” (explains the difference between policy selection and logical counterfactuals well), and understanding what Wei Dai calls “decision theoretic thinking” (see comments: 1, 2, 3). I think a lot of (especially old) content on decision theory is confusingly written or unfriendly to beginners, and would recommend skipping around to find explanations that “click”.

Comparison dimensions

My main motivation is to try to distinguish between TDT, UDT, and FDT, so I focus on three dimensions for comparison that I think best display the differences between these decision theories.

Outermost iteration

All of the decision theories in this post iterate through some set of “options” (intentionally vague) at the outermost layer of execution to find the best “option”. However, the nature (type) of these “options” differs among the various theories. Most decision theories iterate through either actions or policies. When a decision theory iterates through actions (to find the best action), it is doing “action selection”, and the decision theory outputs a single action. When a decision theory iterates through policies (to find the best policy), it is doing “policy selection”, and outputs a single policy, which is an observation-to-action mapping. To get an action out of a decision theory that does policy selection (because what we really care about is knowing which action to take), we must call the policy on the actual observation.

Using the notation of the FDT paper, an action has type while a policy has type , where is the set of observations. So given a policy and observation , we get the action by calling on , i.e. .

From the expected utility formula of the decision theory, you can tell action vs policy selection by seeing what variable comes beneath the operator (the operator is what does the outermost iteration); if it is (or similar) then it is iterating over actions, and if it is (or similar), then it is iterating over policies.

One exception to the above is UDT2, which seems to iterate over algorithms.

Updatelessness

In some decision problems, the agent makes an observation, and has the choice of updating on this observation before acting. Two examples of this are: in counterfactual mugging (a.k.a. curious benefactor), where the agent makes the observation that the coin has come up tails; and in the transparent box Newcomb’s problem, where the agent sees whether the big box is full or empty.

If the decision algorithm updates on the observation, it is updateful (a.k.a. “not updateless”). If it doesn’t update on the observation, it is updateless.

This idea is similar to how in Rawls’s “veil of ignorance”, you must pick your moral principles, societal policies, etc., before you find out who you are in the world or as if you don’t know who you are in the world.

How can you tell if a decision theory is updateless? In its expected utility formula, if it conditions on the observation, it is updateful. In this case the probability factor looks like , where is the observation (sometimes the observation is called “sense data” and is denoted by ). If a decision theory is updateless, the conditioning on “” is absent. Updatelessness only makes a difference in decision problems that have observations.

There seem to be different meanings of “updateless” in use. In this post I will use the above meaning. (I will try to post a question on LessWrong soon about these different meanings.)

Type of counterfactual

In the course of reasoning about a decision problem, the agent can construct counterfactuals or hypotheticals like “if I do this, then that happens”. There are several different kinds of counterfactuals, and decision theories are divided among them.

The three types of counterfactuals that will concern us are: causal, conditional/evidential, and logical/subjunctive. The distinctions between these are explained clearly in the FDT paper so I recommend reading that (and I won’t explain them here).

In the expected utility formula, if the probability factor looks like then it is evidential, and if it looks like then it is causal. I have seen the logical counterfactual written in many ways:

  • e.g. in the FDT paper, p. 14