Mar 13, 2012

174 comments

**Summary:** *If you've been wondering why people keep going on about decision theory on Less Wrong, I wrote you this post as an answer. I explain what decision theories are, show how Causal Decision Theory works and where it seems to give the wrong answers, introduce (very briefly) some candidates for a more advanced decision theory, and touch on the (possible) connection between decision theory and ethics.*

This is going to sound silly, but a decision theory is an algorithm for making decisions.^{0} The inputs are an agent's knowledge of the world, and the agent's goals and values; the output is a particular action (or plan of actions). Actually, in many cases the goals and values are implicit in the algorithm rather than given as input, but it's worth keeping them distinct in theory.

For example, we can think of a chess program as a simple decision theory. If you feed it the current state of the board, it returns a move, which advances the implicit goal of winning. The actual details of the decision theory include things like writing out the tree of possible moves and countermoves, and evaluating which possibilities bring it closer to winning.

Another example is an *E. Coli* bacterium. It has two basic options at every moment: it can use its flagella to swim forward in a straight line, or to change directions by randomly tumbling. It can sense whether the concentration of food or toxin is increasing or decreasing over time, and so it executes a simple algorithm that randomly changes direction more often when things are "getting worse". This is enough control for bacteria to rapidly seek out food and flee from toxins, without needing any sort of advanced information processing.

A human being is a much more complicated example which combines some aspects of the two simpler examples; we mentally model consequences in order to make many decisions, and we also follow heuristics that have evolved to work well without explicitly modeling the world.^{1} We can't model anything quite like the complicated way that human beings make decisions, but we can study simple decision theories on simple problems; and the results of this analysis were often more effective than the raw intuitions of human beings (who evolved to succeed in small savannah tribes, not negotiate a nuclear arms race). But the standard model used for this analysis, Causal Decision Theory, has a serious drawback of its own, and the suggested replacements are important for a number of things that Less Wrong readers might care about.

Causal decision theory (CDT to all the cool kids) is a particular class of decision theories with some nice properties. It's straightforward to state, has some nice mathematical features, can be adapted to any utility function, and gives good answers on many problems. We'll describe how it works in a fairly simple but general setup.

Let **X** be an agent who shares a world with some other agents (**Y _{1}** through

We'll assume that **X** has goals and values represented by a utility function: for every consequence **C**, there's a number **U(C)** representing just how much **X** prefers that outcome, and **X** views equal *expected* utilities with indifference: a 50% chance of utility 0 and 50% chance of utility 10 is no better or worse than 100% chance of utility 5, for instance. (If these assumptions sound artificial, remember that we're trying to make this as mathematically simple as we can in order to analyze it. I don't think it's as artificial as it seems, but that's a different topic.)

**X** wants to maximize its expected utility. If there were no other agents, this would be simple: model the world, estimate how likely each consequence is to happen if it does this action or that, calculate the expected utility of each action, then perform the action that results in the highest expected utility. But if there are other agents around, the outcomes depend on their actions as well as on **X**'s action, and if **X** treats *that* uncertainty like normal uncertainty, then there might be an opportunity for the **Y**s to exploit **X**.

This is a Difficult Problem in general; a full discussion would involve Nash equilibria, but even that doesn't fully settle the matter- there can be more than one equilibrium! Also, **X** *can* sometimes treat another agent as predictable (like a fixed outcome or an ordinary random variable) and get away with it.

CDT is a *class* of decision theories, not a specific decision theory, so it's impossible to specify with full generality how **X** will decide if **X** is a causal decision theorist. But there is one key property that distinguishes CDT from the decision theories we'll talk about later: a CDT agent assumes that **X**'s decision is *independent* from the simultaneous decisions of the **Y**s- that is, **X** could decide one way or another and everyone else's decisions would stay the same.

Therefore, there is at least one case where we can say what a CDT agent will do in a multi-player game: some strategies are dominated by others. For example, if **X** and **Y** are both deciding whether to walk to the zoo, and **X** will be happiest if **X** and **Y** both go, but **X** would still be happier at the zoo than at home even if **Y** doesn't come along, then **X** should go to the zoo regardless of what **Y** does. (Presuming that **X**'s utility function is focused on being happy that afternoon.) This criterion is enough to "solve" many problems for a CDT agent, and in zero-sum two-player games the solution can be shown to be optimal for **X**.

There are many simplifications and abstractions involved in CDT, but that assumption of independence turns out to be key. In practice, people put a lot of effort into predicting what other people might decide, sometimes with impressive accuracy, and then base their own decisions on that prediction. This wrecks the independence of decisions, and so it turns out that in a non-zero-sum game, it's possible to "beat" the outcome that CDT gets.

The classical thought experiment in this context is called Newcomb's Problem. **X** meets with a very smart and honest alien, Omega, that has the power to accurately predict what **X** would do in various hypothetical situations. Omega presents **X** with two boxes, a clear one containing $1,000 and an opaque one containing either $1,000,000 or nothing. Omega explains that **X** can either take the opaque box (this is called *one-boxing*) or both boxes (*two-boxing*), but there's a trick: Omega predicted in advance what **X** would do, and put $1,000,000 into the opaque box only if **X** was predicted to one-box. (This is a little devious, so take some time to ponder it if you haven't seen Newcomb's Problem before- or read here for a fuller explanation.)

If **X** is a causal decision theorist, the choice is clear: whatever Omega decided, it decided already, and whether the opaque box is full or empty, **X** is better off taking both. (That is, two-boxing is a dominant strategy over one-boxing.) So **X** two-boxes, and walks away with $1,000 (since Omega easily predicted that this would happen). Meanwhile, **X**'s cousin **Z** (not a CDT) decides to one-box, and finds the box full with $1,000,000. So it certainly seems that one could do better than CDT in this case.

But is this a fair problem? After all, we can always come up with problems that trick the rational agent into making the wrong choice, while a dumber agent lucks into the right one. Having a very powerful predictor around might seem artificial, although the problem might look much the same if Omega had a 90% success rate rather than 100%. One reason that this is a fair problem is that the outcome depends only on what action **X** is simulated to take, not on what process produced the decision.

Besides, we can see the same behavior in another famous game theory problem: the Prisoner's Dilemma.** X** and **Y** are collaborating on a project, but they have different goals for it, and either one has the opportunity to achieve their goal a little better at the cost of significantly impeding their partner's goal. (The options are called *cooperation* and *defection*.) If they both cooperate, they get a utility of +50 each; if **X** cooperates and **Y** defects, then **X** winds up at +10 but **Y** gets +70, and vice versa; but if they both defect, then both wind up at +30 each.^{2}

If **X** is a CDT agent, then defecting dominates cooperating as a strategy, so **X** will always defect in the Prisoner's Dilemma (as long as there are no further ramifications; the Iterated Prisoner's Dilemma can be different, because **X**'s *current* decision can influence **Y**'s *future* decisions). Even if you knowingly pair up **X** with a copy of itself (with a different goal but the same decision theory), it will defect even though it could prove that the two decisions will be identical.

Meanwhile, its cousin **Z** also plays the Prisoner's Dilemma: **Z** cooperates when it's facing an agent that has the same decision theory, and defects otherwise. This is a strictly better outcome than **X** gets. (**Z** isn't optimal, though; I'm just showing that you can find a strict improvement on **X**.)^{3}

I realize this post is pretty long already, but it's way too short to outline the advanced decision theories that have been proposed and developed recently by a number of people (including Eliezer, Gary Drescher, Wei Dai, Vladimir Nesov and Vladimir Slepnev). Instead, I'll list the features that we would want an advanced decision theory to have:

- The decision theory should be formalizable at least as well as CDT is.
- The decision theory should give answers that are at least as good as CDT's answers. In particular, it should always get the right answer in 1-player games and find a Nash equilibrium in zero-sum two-player games (when the other player is also able to do so).
- The decision theory should strictly outperform CDT on the Prisoner's Dilemma- it should elicit mutual cooperation in the Prisoner's Dilemma from some agents that CDT elicits mutual defection from, it shouldn't cooperate when its partner defects, and (arguably) it should defect if its partner would cooperate regardless.
- The decision theory should one-box on Newcomb's Problem.
- The decision theory should be reasonably simple, and not include a bunch of ad-hoc rules. We want to solve problems involving prediction of actions in general, not just the special cases.

There are now a couple of candidate decision theories (Timeless Decision Theory, Updateless Decision Theory, and Ambient Decision Theory) which seem to meet these criteria. Interestingly, formalizing any of these tends to deeply involve the mathematics of self-reference (Gödel's Theorem and Löb's Theorem) in order to avoid the infinite regress inherent in simulating an agent that's simulating you.

But for the time being, we can massively oversimplify and outline them. TDT considers your ultimate decision as the cause of both your action and other agents' valid predictions of your action, and tries to pick the decision that works best under that model. ADT uses a kind of diagonalization to predict the effects of different decisions without having the final decision throw off the prediction. And UDT considers the decision that would be the best policy for all possible versions of you to employ, on average.

There are a few reasons. Firstly, there are those who think that advanced decision theories are a natural base on which to build AI. One reason for this is something I briefly mentioned: even CDT allows for the idea that **X**'s current decisions can affect **Y**'s future decisions, and self-modification counts as a decision. If **X** can self-modify, and if **X** expects to deal with situations where an advanced decision theory would out-perform its current self, then **X** will change itself into an advanced decision theory (with some weird caveats: for example, if **X** started out as CDT, its modification will only care about other agents' decisions made after **X** self-modified).

More relevantly to rationalists, the bad choices that CDT makes are often held up as examples of why you shouldn't try to be rational, or why rationalists can't cooperate. But instrumental rationality doesn't need to be synonymous with causal decision theory: if there are other decision theories that do strictly better, we should adopt those rather than CDT! So figuring out advanced decision theories, even if we can't implement them on real-world problems, helps us see that the ideal of rationality isn't going to fall flat on its face.

Finally, advanced decision theory could be relevant to morality. If, as many of us suspect, there's no basis for human morality apart from what goes on in human brains, then why do we feel there's still a distinction between what-we-want and what-is-right? One answer is that if we feed in what-we-want into an advanced decision theory, then just as cooperation emerges in the Prisoner's Dilemma, many kinds of patterns that we take as basic moral rules emerge as the equilibrium behavior. The idea is developed more substantially in Gary Drescher's Good and Real, and (before there was a candidate for an advanced decision theory) in Douglas Hofstadter's concept of superrationality.

It's still at the speculative stage, because it's difficult to work out what interactions between agents with advanced decision theories would look like (in particular, we don't know whether bargaining would end in a fair split or in a Xanatos Gambit Pileup of chicken threats, though we think and hope it's the former). But it's at least a promising approach to the slippery question of what 'right' could actually mean.

And if you want to understand this on a slightly more technical level... well, I've started a sequence.

**Next:** A Semi-Formal Analysis, Part I (The Problem with Naive Decision Theory)

**0.** Rather confusingly, decision theory is the name for the study of decision theories.

**1.** Both patterns appear in our conscious reasoning as well as our subconscious thinking- we care about consequences we can directly foresee and also about moral rules that don't seem attached to any particular consequence. However, just as the simple "program" for the bacterium was constructed by evolution, our moral rules are there for evolutionary reasons as well- perhaps even for reasons that have to do with advanced decision theory...

Also, it's worth noting that we're not consciously aware of all of our values and goals, though at least we have a better idea of them than *E.Coli* does. This is a problem for the idea of representing our usual decisions in terms of decision theory, though we can still hope that our approximations are good enough (e.g. that our real values regarding the Cold War roughly corresponded to our estimates of how bad a nuclear war or a Soviet world takeover would be).

**2.** Eliezer once pointed out that our intuitions on most formulations of the Prisoner's Dilemma are skewed by our notions of fairness, and a more outlandish example might serve better to illustrate how a genuine PD really feels. For an example where people are notorious for not caring about each others' goals, let's consider aesthetics: people who love one form of music often really feel that another popular form is a waste of time. One might feel that if the works of Artist Q suddenly disappeared from the world, it would objectively be a tragedy; while if the same happened to the works of Artist R, then it's no big deal and R's fans should be glad to be freed from that dreck.

We can use this aesthetic intolerance to construct a more genuine Prisoner's Dilemma without inviting aliens or anything like that. Say **X** is a writer and **Y** is an illustrator, and they have very different preferences for how a certain scene should come across, so they've worked out a compromise. Now, both of them could cooperate and get a scene that both are OK with, or **X** could secretly change the dialogue in hopes of getting his idea to come across, or **Y** could draw the scene differently in order to get her idea of the scene across. But if they both "defect" from the compromise, then the scene gets confusing to readers. If both **X** and **Y** prefer their own idea to the compromise, prefer the compromise to the muddle, and prefer the muddle to their partner's idea, then this is a genuine Prisoner's Dilemma.

**3.** I've avoided mentioning Evidential Decision Theory, the "usual" counterpart to CDT; it's worth noting that EDT one-boxes on Newcomb's Problem but gives the wrong answer on a classical one-player problem (The Smoking Lesion) which the advanced decision theories handle correctly. It's also far less amenable to formalization than the others.