**Summary:** *If you've been wondering why people keep going on about decision theory on Less Wrong, I wrote you this post as an answer. I explain what decision theories are, show how Causal Decision Theory works and where it seems to give the wrong answers, introduce (very briefly) some candidates for a more advanced decision theory, and touch on the (possible) connection between decision theory and ethics.*

## What is a decision theory?

This is going to sound silly, but a decision theory is an algorithm for making decisions.^{0} The inputs are an agent's knowledge of the world, and the agent's goals and values; the output is a particular action (or plan of actions). Actually, in many cases the goals and values are implicit in the algorithm rather than given as input, but it's worth keeping them distinct in theory.

For example, we can think of a chess program as a simple decision theory. If you feed it the current state of the board, it returns a move, which advances the implicit goal of winning. The actual details of the decision theory include things like writing out the tree of possible moves and countermoves, and evaluating which possibilities bring it closer to winning.

Another example is an *E. Coli* bacterium. It has two basic options at every moment: it can use its flagella to swim forward in a straight line, or to change directions by randomly tumbling. It can sense whether the concentration of food or toxin is increasing or decreasing over time, and so it executes a simple algorithm that randomly changes direction more often when things are "getting worse". This is enough control for bacteria to rapidly seek out food and flee from toxins, without needing any sort of advanced information processing.

A human being is a much more complicated example which combines some aspects of the two simpler examples; we mentally model consequences in order to make many decisions, and we also follow heuristics that have evolved to work well without explicitly modeling the world.^{1} We can't model anything quite like the complicated way that human beings make decisions, but we can study simple decision theories on simple problems; and the results of this analysis were often more effective than the raw intuitions of human beings (who evolved to succeed in small savannah tribes, not negotiate a nuclear arms race). But the standard model used for this analysis, Causal Decision Theory, has a serious drawback of its own, and the suggested replacements are important for a number of things that Less Wrong readers might care about.

## What is Causal Decision Theory?

Causal decision theory (CDT to all the cool kids) is a particular class of decision theories with some nice properties. It's straightforward to state, has some nice mathematical features, can be adapted to any utility function, and gives good answers on many problems. We'll describe how it works in a fairly simple but general setup.

Let **X** be an agent who shares a world with some other agents (**Y _{1}** through

**Y**). All these agents are going to privately choose actions and then perform them simultaneously, and the actions will have consequences. (For instance, they could be playing a round of Diplomacy.)

_{n}We'll assume that **X** has goals and values represented by a utility function: for every consequence **C**, there's a number **U(C)** representing just how much **X** prefers that outcome, and **X** views equal *expected* utilities with indifference: a 50% chance of utility 0 and 50% chance of utility 10 is no better or worse than 100% chance of utility 5, for instance. (If these assumptions sound artificial, remember that we're trying to make this as mathematically simple as we can in order to analyze it. I don't think it's as artificial as it seems, but that's a different topic.)

**X** wants to maximize its expected utility. If there were no other agents, this would be simple: model the world, estimate how likely each consequence is to happen if it does this action or that, calculate the expected utility of each action, then perform the action that results in the highest expected utility. But if there are other agents around, the outcomes depend on their actions as well as on **X**'s action, and if **X** treats *that* uncertainty like normal uncertainty, then there might be an opportunity for the **Y**s to exploit **X**.

This is a Difficult Problem in general; a full discussion would involve Nash equilibria, but even that doesn't fully settle the matter- there can be more than one equilibrium! Also, **X** *can* sometimes treat another agent as predictable (like a fixed outcome or an ordinary random variable) and get away with it.

CDT is a *class* of decision theories, not a specific decision theory, so it's impossible to specify with full generality how **X** will decide if **X** is a causal decision theorist. But there is one key property that distinguishes CDT from the decision theories we'll talk about later: a CDT agent assumes that **X**'s decision is *independent* from the simultaneous decisions of the **Y**s- that is, **X** could decide one way or another and everyone else's decisions would stay the same.

Therefore, there is at least one case where we can say what a CDT agent will do in a multi-player game: some strategies are dominated by others. For example, if **X** and **Y** are both deciding whether to walk to the zoo, and **X** will be happiest if **X** and **Y** both go, but **X** would still be happier at the zoo than at home even if **Y** doesn't come along, then **X** should go to the zoo regardless of what **Y** does. (Presuming that **X**'s utility function is focused on being happy that afternoon.) This criterion is enough to "solve" many problems for a CDT agent, and in zero-sum two-player games the solution can be shown to be optimal for **X**.

## What's the problem with Causal Decision Theory?

There are many simplifications and abstractions involved in CDT, but that assumption of independence turns out to be key. In practice, people put a lot of effort into predicting what other people might decide, sometimes with impressive accuracy, and then base their own decisions on that prediction. This wrecks the independence of decisions, and so it turns out that in a non-zero-sum game, it's possible to "beat" the outcome that CDT gets.

The classical thought experiment in this context is called Newcomb's Problem. **X** meets with a very smart and honest alien, Omega, that has the power to accurately predict what **X** would do in various hypothetical situations. Omega presents **X** with two boxes, a clear one containing $1,000 and an opaque one containing either $1,000,000 or nothing. Omega explains that **X** can either take the opaque box (this is called *one-boxing*) or both boxes (*two-boxing*), but there's a trick: Omega predicted in advance what **X** would do, and put $1,000,000 into the opaque box only if **X** was predicted to one-box. (This is a little devious, so take some time to ponder it if you haven't seen Newcomb's Problem before- or read here for a fuller explanation.)

If **X** is a causal decision theorist, the choice is clear: whatever Omega decided, it decided already, and whether the opaque box is full or empty, **X** is better off taking both. (That is, two-boxing is a dominant strategy over one-boxing.) So **X** two-boxes, and walks away with $1,000 (since Omega easily predicted that this would happen). Meanwhile, **X**'s cousin **Z** (not a CDT) decides to one-box, and finds the box full with $1,000,000. So it certainly seems that one could do better than CDT in this case.

But is this a fair problem? After all, we can always come up with problems that trick the rational agent into making the wrong choice, while a dumber agent lucks into the right one. Having a very powerful predictor around might seem artificial, although the problem might look much the same if Omega had a 90% success rate rather than 100%. One reason that this is a fair problem is that the outcome depends only on what action **X** is simulated to take, not on what process produced the decision.

Besides, we can see the same behavior in another famous game theory problem: the Prisoner's Dilemma.** X** and **Y** are collaborating on a project, but they have different goals for it, and either one has the opportunity to achieve their goal a little better at the cost of significantly impeding their partner's goal. (The options are called *cooperation* and *defection*.) If they both cooperate, they get a utility of +50 each; if **X** cooperates and **Y** defects, then **X** winds up at +10 but **Y** gets +70, and vice versa; but if they both defect, then both wind up at +30 each.^{2}

If **X** is a CDT agent, then defecting dominates cooperating as a strategy, so **X** will always defect in the Prisoner's Dilemma (as long as there are no further ramifications; the Iterated Prisoner's Dilemma can be different, because **X**'s *current* decision can influence **Y**'s *future* decisions). Even if you knowingly pair up **X** with a copy of itself (with a different goal but the same decision theory), it will defect even though it could prove that the two decisions will be identical.

Meanwhile, its cousin **Z** also plays the Prisoner's Dilemma: **Z** cooperates when it's facing an agent that has the same decision theory, and defects otherwise. This is a strictly better outcome than **X** gets. (**Z** isn't optimal, though; I'm just showing that you can find a strict improvement on **X**.)^{3}

## What decision theories are better than CDT?

I realize this post is pretty long already, but it's way too short to outline the advanced decision theories that have been proposed and developed recently by a number of people (including Eliezer, Gary Drescher, Wei Dai, Vladimir Nesov and Vladimir Slepnev). Instead, I'll list the features that we would want an advanced decision theory to have:

- The decision theory should be formalizable at least as well as CDT is.
- The decision theory should give answers that are at least as good as CDT's answers. In particular, it should always get the right answer in 1-player games and find a Nash equilibrium in zero-sum two-player games (when the other player is also able to do so).
- The decision theory should strictly outperform CDT on the Prisoner's Dilemma- it should elicit mutual cooperation in the Prisoner's Dilemma from some agents that CDT elicits mutual defection from, it shouldn't cooperate when its partner defects, and (arguably) it should defect if its partner would cooperate regardless.
- The decision theory should one-box on Newcomb's Problem.
- The decision theory should be reasonably simple, and not include a bunch of ad-hoc rules. We want to solve problems involving prediction of actions in general, not just the special cases.

There are now a couple of candidate decision theories (Timeless Decision Theory, Updateless Decision Theory, and Ambient Decision Theory) which seem to meet these criteria. Interestingly, formalizing any of these tends to deeply involve the mathematics of self-reference (Gödel's Theorem and Löb's Theorem) in order to avoid the infinite regress inherent in simulating an agent that's simulating you.

But for the time being, we can massively oversimplify and outline them. TDT considers your ultimate decision as the cause of both your action and other agents' valid predictions of your action, and tries to pick the decision that works best under that model. ADT uses a kind of diagonalization to predict the effects of different decisions without having the final decision throw off the prediction. And UDT considers the decision that would be the best policy for all possible versions of you to employ, on average.

## Why are advanced decision theories important for Less Wrong?

There are a few reasons. Firstly, there are those who think that advanced decision theories are a natural base on which to build AI. One reason for this is something I briefly mentioned: even CDT allows for the idea that **X**'s current decisions can affect **Y**'s future decisions, and self-modification counts as a decision. If **X** can self-modify, and if **X** expects to deal with situations where an advanced decision theory would out-perform its current self, then **X** will change itself into an advanced decision theory (with some weird caveats: for example, if **X** started out as CDT, its modification will only care about other agents' decisions made after **X** self-modified).

More relevantly to rationalists, the bad choices that CDT makes are often held up as examples of why you shouldn't try to be rational, or why rationalists can't cooperate. But instrumental rationality doesn't need to be synonymous with causal decision theory: if there are other decision theories that do strictly better, we should adopt those rather than CDT! So figuring out advanced decision theories, even if we can't implement them on real-world problems, helps us see that the ideal of rationality isn't going to fall flat on its face.

Finally, advanced decision theory could be relevant to morality. If, as many of us suspect, there's no basis for human morality apart from what goes on in human brains, then why do we feel there's still a distinction between what-we-want and what-is-right? One answer is that if we feed in what-we-want into an advanced decision theory, then just as cooperation emerges in the Prisoner's Dilemma, many kinds of patterns that we take as basic moral rules emerge as the equilibrium behavior. The idea is developed more substantially in Gary Drescher's Good and Real, and (before there was a candidate for an advanced decision theory) in Douglas Hofstadter's concept of superrationality.

It's still at the speculative stage, because it's difficult to work out what interactions between agents with advanced decision theories would look like (in particular, we don't know whether bargaining would end in a fair split or in a Xanatos Gambit Pileup of chicken threats, though we think and hope it's the former). But it's at least a promising approach to the slippery question of what 'right' could actually mean.

And if you want to understand this on a slightly more technical level... well, I've started a sequence.

**Next:** A Semi-Formal Analysis, Part I (The Problem with Naive Decision Theory)

### Notes:

**0.** Rather confusingly, decision theory is the name for the study of decision theories.

**1.** Both patterns appear in our conscious reasoning as well as our subconscious thinking- we care about consequences we can directly foresee and also about moral rules that don't seem attached to any particular consequence. However, just as the simple "program" for the bacterium was constructed by evolution, our moral rules are there for evolutionary reasons as well- perhaps even for reasons that have to do with advanced decision theory...

Also, it's worth noting that we're not consciously aware of all of our values and goals, though at least we have a better idea of them than *E.Coli* does. This is a problem for the idea of representing our usual decisions in terms of decision theory, though we can still hope that our approximations are good enough (e.g. that our real values regarding the Cold War roughly corresponded to our estimates of how bad a nuclear war or a Soviet world takeover would be).

**2.** Eliezer once pointed out that our intuitions on most formulations of the Prisoner's Dilemma are skewed by our notions of fairness, and a more outlandish example might serve better to illustrate how a genuine PD really feels. For an example where people are notorious for not caring about each others' goals, let's consider aesthetics: people who love one form of music often really feel that another popular form is a waste of time. One might feel that if the works of Artist Q suddenly disappeared from the world, it would objectively be a tragedy; while if the same happened to the works of Artist R, then it's no big deal and R's fans should be glad to be freed from that dreck.

We can use this aesthetic intolerance to construct a more genuine Prisoner's Dilemma without inviting aliens or anything like that. Say **X** is a writer and **Y** is an illustrator, and they have very different preferences for how a certain scene should come across, so they've worked out a compromise. Now, both of them could cooperate and get a scene that both are OK with, or **X** could secretly change the dialogue in hopes of getting his idea to come across, or **Y** could draw the scene differently in order to get her idea of the scene across. But if they both "defect" from the compromise, then the scene gets confusing to readers. If both **X** and **Y** prefer their own idea to the compromise, prefer the compromise to the muddle, and prefer the muddle to their partner's idea, then this is a genuine Prisoner's Dilemma.

**3.** I've avoided mentioning Evidential Decision Theory, the "usual" counterpart to CDT; it's worth noting that EDT one-boxes on Newcomb's Problem but gives the wrong answer on a classical one-player problem (The Smoking Lesion) which the advanced decision theories handle correctly. It's also far less amenable to formalization than the others.

This is a good post, but it would be

super valuableif you could explain the more advanced decision theories and the current problems people are working on as clearly as you explained the basics here.It's pretty easy to explain the main innovation in TDT/UDT/ADT: they all differ from EDT/CDT in how they answer "What is it that you're deciding when you make a decision?" and "What are the consequences of a decision?", and in roughly the same way. They answer the former by "You're deciding the logical fact that the program-that-is-you makes a certain output." and the latter by "The consequences are the logical consequences of that logical fact." UDT differs from ADT in that UDT uses an unspecified "math intuition module" to form probability distribution over possible logical consequences, whereas ADT uses logical deduction and only considers consequences that it can prove. (TDT also makes use of Pearl's theory of causality which I admittedly do not understand.)

This post isn't really correct about what distinguishes CDT from EDT or TDT. The distinction has nothing to do with the presence of other agents and can be seen in the absence of such (e.g. Smoking Lesion). Indeed neither decision theory contains a notion of "other agents"; both simply regard things that we might classify as "other agents" simply as features of the environment.

Fundamentally, the following paragraph is wrong:

... (read more)A few minor points:

Oooh, between your description of CDT, and the recent post on how experts will defend their theory even against counterfactuals, I finally understand how someone can possibly justify two-boxing in Newcombe's Problem. As an added bonus, I'm also seeing how being "half rational" can be intensely dangerous, since it leads to things like two-boxing in Newcombe's Problem :)

Thank you so much for this. I really don't feel like I understand decision theories very well, so this was helpful. But I'd like to ask a couple questions that I've had for awhile that this didn't really answer.

Why does evidential decision theory necessarily fail the Smoking Lesion Problem? That link is technically Solomon's problem, not Smoking Lesion problem, but it's related. If p(Cancer|smoking|lesion) = p(Cancer|not smoking|lesion), why is Evidential Decision Theory forbidden from using these probabilities? Evidential decision theory makes a lot of

I'm guessing what matters is not so much time as the causal dependence of those decisions made by other agents on the physical event of the decision of

Xto self-modify. So the improvedXstill won't care about its influence on future decisions made by other agents for reasons other thanXhaving self-modified. For example, take the (future) decisions of other agents that are space-like separated fromX's... (read more)It might also help to consider examples in which "cooperation" doesn't give warm fuzzy feelings and "defection" the opposite. Businessmen forming a cartel are also in a PD situation. Do we want businessmen to gang up against their customers?

This may be culturally specific, though. It's interesting that in the standard PD, we're supposed to be rooting for the prisoners to go free. Is that how it's viewed in other countries?

Thanks for writing this. I would object to calling a decision theory an "algorithm", though, since it doesn't actually specify how to make the computation, and in practice the implied computations from most decision theories are completely infeasible (for instance, the chess decision theory requires a full search of the game tree).

Of course, it would be much more satisfying and useful if decision theories actually were algorithms, and I would be very interested to see any that achieve this or move in that direction.

... (read more)Does anyone have a decent idea of the differences between UDT, TDT and ADT? (Not in terms of the concepts they're based on; in terms of problems to which they give different answers.)

There are a couple of things I find odd about this. First, it seems to be taken for granted that one-boxing is obviously better than two boxing, but I'm not sure that's right. J.M. Joyce has an argument (in his

foundations of causal decision theory) that is supposed to convince you that two-boxing isthe rightsolution. Importantly, he accepts that you might still wish you weren't a CDT (so that Omega predicted you would one-box). But, he says, in either case, once the boxes are in front of you, whether you are a CDT or a EDT,you should two-box! The domin... (read more)Thanks for the recap. It still doesn't answer my question, though:

This appears to be incorrect if the CDT knows that Omega always makes correct predictions

And this appears to be incorrect in all cases. The right decision depends on exact nature of the noise. If Omega makes the decision by analyzing the agent's psychological tests taken in childhood, then the agent should ... (read more)

This might not satisfactorily answer your confusion but: CDT is defined by the fact that it has incorrect causal graphs. If it has correct causal graphs then it's not CDT. Why bother talking about a "decision theory" that is arbitrarily limited to incorrect causal graphs? Because that's the decision theory that academic decision theorists like to talk about and treat as default. Why did academic decision theorists never realize that their causal graphs were wrong? No one has a very good model of that, but check out Wei Dai's related speculation here. Note that if you define causality in a technical Markovian way and use Bayes nets then there is no difference between CDT and TDT.

I used to get annoyed because CDT with a good enough world model should clearly one-box yet people stipulated that it wouldn't; only later did I realize that it's mostly a rhetorical thing and no one thinks that if you actually implemented an AGI with "CDT" that it'd be as dumb as academia/LessWrong's version of CDT.

If I'm wrong about any of the above then someone please correct me as this is relevant to FAI strategy.

I can't be alone in thinking this; where does one acquire the probability values to make any of this theory useful? Past data? Can it be used on an individual or only organizational level?

I liked this post. it would be good if you put it somewhere in the sequences so that people new to LessWrong can find this basic intro earlier.

Excellent. And I love your aesthetic intolerance dilemma. That's a great spin on it; has it appeared anywhere before?

As it happens, I do know of a real-world case of this kind of problem, where the parties involved chose... defection. From an interview with former anime studio Gainax president Toshio Okada:

... (read more)The decision theories need somewhat specific models of the world to operate correctly. In The Smoking Lesion, for example, the lesion has to somehow lead to you smoking. E.g. the lesion could make you follow CDT while absence of the lesion makes you follow EDT. It's definitely worse to have CDT if it comes at the expense of having the lesion.

The issue here is selection. If you find you opt to smoke, your prior for having lesion goes up, of course; and so you need to be more concerned about the cancer - if you can't check for the lesion you have to perhaps... (read more)

I spent the first several seconds trying to figure out the tree diagram at the top. What does it represent?

Are you sure that you need an advanced decision theory two handle the one-box/two-box problem, or the PD-with-mental-clone problem? You write that

Well, that's a common situation analyzed in game theory, but it's not essential to CDT. Consider playing a game of chess: your choice clearly affects the choice of your opponent. Or consider the decision of whether to punch... (read more)

Personally I don't see the examples given as flaws in Causal Decision Theory at all. The flaw is in the problem statements not CDT.

In the alien predictor example, the key question is "when does the agent set its strategy?". If the agent's strategy is set before the prediction is made, then CDT works fine. The agent decides

in advanceto commit itself to opening one box, the alien realises that, the agent follows through with it and gets $1,000,000. Which is exactly how humans win that game as well. If on the other hand the agent's strategy is not... (read more)Great post!

"X should go to the zoo whatever Y does"

Should be "whenever."

ETA: Oops. See Gust's post.

This is nit picking, but in the specific case of Newcomb's problem, it's intentionally

unclearif your decision affects Omega's.That doesn't seem to follow.

It is scientifically conventional to have the past causing the ... (read more)

I don't understand the need for this "advanced" decision theory. The situations you mention -- Omega and the boxes, PD with a mental clone -- are highly artificial; no human being has ever encountered such a situation. So what relevance do these "advanced" decision theories have to decisions of real people in the real world?

Upvoted; very interesting.