Decision Theories: A Semi-Formal Analysis, Part II

2duwease

0orthonormal

2Dmytry

2orthonormal

0Dmytry

2orthonormal

2orthonormal

-2Dmytry

3TheOtherDave

0Dmytry

0TheOtherDave

2orthonormal

2Dmytry

1Jobst Heitzig

1Jonathan_Graehl

3orthonormal

0Jonathan_Graehl

0orthonormal

2orthonormal

0patrickscottshields

-20Rhwawn

8Matt_Simpson

0Jonathan_Graehl

0orthonormal

-2WrongBot

1EStokes

3Jonathan_Graehl

-2Ezekiel

New Comment

28 comments, sorted by Click to highlight new comments since: Today at 2:42 AM

You forget that substitution can also be applied to instances of source code of X inside the agent Y , permitting to e.g. one box on Newcomb's if you know omega got your source code, substitute a value for your code's output, and then maximize the payoff on the equation involving a substituted-in variable, by solving for this variable. That approach is not so self fulfilling because you don't assume the particular value you try equals output of your source code (which easily creates contradiction).

edit: nevermind, I see you are leaving it for the next post. Looking forwards to the next post. It'd be interesting what are those two enormous gaping flaws in the method used for much any sort of engineering; a good example would be engineering an AI that aims a gun.

It'd be interesting what are those two enormous gaping flaws in the method used for much any sort of engineering; a good example would be engineering an AI that aims a gun.

They're not going to be flaws in most contexts, just as an automated theorem-prover doesn't need to worry about spurious deductions; it's only when you have the circularity of a program whose output might depend on its output that you need to beware this kind of thing.

Also, you're probably capable of working out the first attempt at TDT and its flaws in this context, if you want to try your hand at this sort of problem. I'm splitting the post here not because it's an especially difficult cliffhanger, but because readers' eyes glaze over when a post starts getting too long.

it's only when you have the circularity of a program whose output might depend on its output that you need to beware this kind of thing.

Well, the substitutions are specifically to turn a circularity into a case of having x on both sides of some equation. We might be talking about different things. The failure mode is benign; you arrive at x=x .

edit: ahh, another thing. If you have source of randomness, you need to consider the solution with, and without, the substitution, as you can make substitution invalid by employing the random number generator. The substitution of the nonrandom part of strategy can still be useful though. Maybe that's what you had in mind?

If you have source of randomness, you need to consider the solution with, and without, the substitution, as you can make substitution invalid by employing the random number generator.

Err, I'm not sure what you mean here. In the CDT algorithm, if it deduces that Y employs a particular mixed strategy, then it can calculate the expected value of each action against that mixed strategy.

(For complete simplicity, though, starting next post I'm going to assume that there's at least one pure Nash equilibrium option in G. If it doesn't start with one, we can treat a mixed equilibrium as x*{n+1} and y*{m+1}, and fill in the new row and column of the matrix with the right expected values.)

I mean this sort of circularity:

The calculator computes "What is 2 + 3?", not "What does this calculator compute as the result of 2 + 3?" The answer to the former question is 5, but if the calculator were to ask the latter question instead, the result could self-consistently be anything at all! If the calculator returned 42, then indeed, "What does this calculator compute as the result of 2 + 3?" would in fact be 42.

I agree that some forms are benign. The Naive Decision Theory post and cousin_it's followup illustrate a malignant form.

That's why you don't let your calculator be sentient. FAI might give a number that makes you most happy, which might well be 42 if you are not relying on this number for anything useful. (E.g. it might tell 42 as a joke, knowing that you know what 2+3 is)

Edit: you can, however, have some requirements on the calculator's output, and then there will be the number that satisfies those criteria; the x substitution will work to solve for this value, and in principle even to solve for protective measures to take against cosmic rays, and so on.

edit: and on the NDT, it doesn't one-way substitute at start. It assumes equivalence.

FAI might give a number that makes you most happy

Sure, if it happened to be in a situation where the most valuable thing to do by my standards was make me happy. Agreed.

You seem to be implying that I should prefer to avoid this result... do you in fact believe that?

If so, can you clarify why?

A somewhat analogous real-world situation: one of Siri's possible responses to "open the pod bay doors" as a command is "We're never going to live that down, are we?" This delights me enormously, and costs nothing of consequence. Should I prefer that this result be eliminated?

Actually I misunderstood his point with calculator. He was speaking of NDT with issues resulting from equivalence, i thought he was speaking of issues resulting from substitution. I did not mean to imply that you should avoid this result, simply that if you want your calculator to work by the decision theory I thought he was referring to, it got to have some utilities associated with outputs. And this doesn't really help make a calculating device.

Who said anything about sentience? NDT, as described, is a perfectly comprehensible program that (in certain games that you or I would regard as fair tests) generates spurious counterfactuals and thus makes terrible decisions, thanks to a particular kind of circularity.

In this sequence, I'm not talking about FAI or anything beyond my current understanding, and I'm not intentionally drawing metaphors. I'm simply outlining programs which (if I could write a good automated theorem-prover) I could write myself, and comparing how they do in a straightforward tournament setting, with the twist of allowing read-access to source codes. We should be able to agree on that base level.

I just stumbled upon this and noticed that a real-world mechanism for international climate policy cooperation that I recently suggested in this paper can be interpreted as a special case of your (**G**,**X**,**Y**) framework.

Assume a fixed game **G** where

- each player's action space is the nonnegative reals,
(**U***x*,*y*) is weakly decreasing in*x*and weakly increasing in*y*.(**V***x*,*y*) is weakly decreasing in*y*and weakly increasing in*x*.

(Many public goods games, such as the Prisoners' Dilemma, have such a structure)

Let's call an object a *Conditional Commitment Function (CCF)* iff it is a bounded, continuous, and weakly increasing function from the nonnegative reals into the nonnegative reals. (Intended interpretation of a CCF *C*: If opponent agrees to do *y*, I agree to do any *x* that has *x <= C*(*y*))

Now consider programs of the following kind:

```
C = <some CCF>
if code(opponent) equals code(myself) except that C is replaced by some CCF D:
output the largest x >= 0 for which there is a y <= D(x) with x <= C(y)
else:
output 0
```

Let's denote this program **Z**(*C*) , where *C* is the CCF occurring in line 1 of the program. Finally, let's consider the meta-game where two programmers **A** and **B**, knowing **G**, each simultaneously choose a *C* and submit the program **Z**(*C*), the two programs are executed once to determine actions (*x*,*y*), **A** gets **U**(*x*,*y*) and **B** gets **V**(*x*,*y*).

(In the real world, the "programmers" could be the two parliaments of two countries that pass two binding laws (the "programs"), and the actions could be domestic levels of greenhouse gas emissions reductions.)

In our paper, we prove that the outcomes that will result from the *strong* Nash equilibria of this meta-game are exactly the Pareto-optimal outcomes (*x*,*y*) that both programmers prefer to the outcome (0,0).

(In an *N* (instead of 2) player context, the outcomes of strong Nash equilibria are exactly the ones from a certain version of the underlying base game's *core*, a subset of the Pareto frontier that might however be empty)*.*

I'd be interested in learning whether you think this is an interesting application context to explore the theories you discuss.

It seems the point of all this is to study the properties of programs that "cooperate" on one-shot games with known payoffs. So, there's no point dwelling on objections as to the practicality of any of this (how to prove that an embodied computer actually implements given source code, etc. - see Ken Thompson's C compiler/login trojan)

I'd appreciate more precision on "If G is a prisoner's dilemma".

Also, I suppose there's nothing prohibiting mixed strategies (choosing a probability distribution over your likely moves, from which the actual move is drawn) - e.g. the machine the programs run on has access to random bits.

Since I didn't actually follow your definitions carefully enough to answer, can you tell me what would be played by two identical rational CDT, Cliquish-DT, or TDT agents (or same-clique-agents) on the game:

```
(2,-1) (0,0)
(0,0) (0,0)
```

( with (row player, column player) payoffs )

Row player will always choose row 1 (if he wants to maximize the sum of payoffs, or his own payoff). What will column player choose? Take the -1 for the good of the row player?

That is, I'm asking whether programs which self-sacrifice when faced with another member of their clique are supposed to be 'rational' under one of these theories.

If it weren't a one-shot game, I'd wonder about history - if an external force assigns a particular instance only to the sacrificial role, does it keep on sacrificing? I can only ask this informally, unfortunately.

I'd appreciate more precision on "If G is a prisoner's dilemma".

We can read off the payoff matrix. G is a Prisoner's Dilemma if for each player, utility(I defect, they cooperate)>utility(we both cooperate)>utility(we both defect)>utility(I cooperate, they defect).

I'd like to cite this article (or related published work) in a research project paper I'm writing which includes application of an expected utility-maximizing algorithm to a version of the prisoner's dilemma. Do you have anything more cite-able than this article's URL and your LW username? I didn't see anything in your profile which could point me towards your real name and anything you might have published.

## Or: Causal Decision Theory and Substitution

Previously:0. Decision Theories: A Less Wrong Primer

1. The Problem with Naive Decision Theory

Summary of Post:We explore the role of substitution in avoiding spurious counterfactuals, introduce an implementation of Causal Decision Theory and a CliqueBot, and set off in the direction of Timeless Decision Theory.In the last post, we showed the problem with what we termed Naive Decision Theory, which attempts to prove counterfactuals directly and pick the best action: there's a possibility of spurious counterfactuals which lead to terrible decisions. We'll want to implement a decision theory that does better; one that is, by any practical definition of the words, foolproof and incapable of error...

I know you're eager to get to Timeless Decision Theory and the others. I'm sorry, but I'm afraid I can't do that just yet. This background is too important for me to allow you to skip it...

Over the next few posts, we'll create a sequence of decision theories, each of which will outperform the previous ones (the new ones will do better in some games, without doing worse in others

^{0}) in a wide range of plausible games.Let's formalize the setup a bit more: we've written a program

X, which is paired with another programYin a gameG. BothXandYhave access to the source codes ofX,Y, andG, andGcalculates the payouts to the programmers ofXand ofYin terms of their outputs alone. That is,XandYshould be functions of two inputs, andGshould do the following:X(codeG, 1, codeY).^{1}Y(codeG, 2, codeX).We write the expected payout to

X's programmer asU, by which we meanU(outputX, outputY), by which we really meanU(outputX(codeG, 1, codeY), outputY(codeG, 2, codeX)); similarly, we'll writeVas the expected payout toY's programmer. We'll need to use quining in order forGto feed its own source code intoXandY, but this is straightforward. (I said earlier thatXshould have access to its own source code as well, but the programmer can do this via quining, with no help fromG.)We're also assuming that

Xis equipped with a valid inference module that deducessometrue statements andnofalse ones; it can return "true", "false" or "unknown" when given a statement. (One such example is an automated theorem prover which gives up after a certain length of time.)Last time, we tried the most straightforward idea—trying to directly prove counterfactuals about the value of

Ufor various outputs ofX—but we ran into the problem of spurious counterfactuals, which are logically true but practically misleading. The various decision theories we'll discuss each represent different ways of getting around this obstacle.## Idea 3: Substitute to Avoid Self-Fulfilling Prophecies

Our spurious counterfactuals were harmful because of the possibility of a self-fulfilling prediction; one way to avoid such a thing is to base your output only on deductions about statements that are in some sense "independent" of the final output. We'll illustrate this principle by doing a simple task that Naive Decision Theory failed to do: infer the correct payoff matrix for

Gfrom its source code.For each pair of outputs (

x,_{i}y), we can simply substitute_{j}xfor (output_{i}X) andyfor (output_{j}Y) in the code forG, obtainingunconditionaldeductions ofU(x,_{i}y) and_{j}V(x,_{i}y)._{j}^{2}Note that this isnotthe counterfactual "if (outputX)=xand (output_{i}Y)=ythen_{j}U=u", but instead a simple mathematical object that doesn't depend on the actual values of (output_{ij}X) and (outputY)!This is a powerful enough principle that, if we use it properly, we arrive at a formalized version of the hallowed classical theory:

## Causal Decision Theory: No Longer Naive

Xcan always simply play a default Nash equilibrium strategy, but there's an obvious case where we wantXto depart from it: if the output ofYis predictable,Xcan often gain more by stepping away from its equilibrium to exploitY's stupidity (or coordinate with it on a different equilibrium). We might programXas follows:Y(codeG, 2, codeX); note that this may be a mixed strategy.xwhich maximizes_{i}U(x,(output_{i}Y)). (There must be at least one pure strategy which does so.)Note that even though there is some circularity involved—

Ymay base its output on the code ofX—this isn't like a spurious counterfactual, because the deduction ofU(x,(output_{i}Y)) is independent of the output ofX. In particular, ifXsucceeds at deducing the output ofY, then that really will be the output ofYin the game (presuming, as usual, that the inference module ofXis valid).This decision theory acts optimally in "one-player games" where the output of

Yis independent of the source code ofX(and can be deduced byX's inference module), and in zero-sum two-player games as well. Finally,Xis un-exploitable in the following sense: if (x,_{i}y) is the Nash equilibrium with the lowest value for_{j}U(x,_{i}y), then_{j}U(X,Y) ≥U(x,_{i}y) or_{j}V(X,Y) ≤V(x,_{i}y). (Namely, if_{j}Yis trying to maximize its own payoff, thenXshould wind up with at least the value of its worst Nash equilibrium.)In a philosophical sense,

Xdoes "consider"Y's inference of (outputX) as a factor in deducing (outputY), but it considers it as ifYwere instead inferring the output of some unrelated program, and in particular there's no "feedback" from this consideration to the second step.Xacts as if its own eventual output cannot causeY's output; that is, we've written an instance of Causal Decision Theory.^{3}We can also see that if a dominant strategy exists (like two-boxing in Newcomb's Problem

^{4}or defecting in the Prisoner's Dilemma), this CDT algorithm will always pick it. We should ask at this point: in our situation with perfect knowledge of source codes, can we do better than the classical solution?## Cliquish Decision Theory: Better, but Not Good Enough

We can, of course. There's at least one case where defecting on the Prisoner's Dilemma is clearly a bad idea: if the source code of

Yis identical to the source code ofX. So we can upgradeXby adding an additional rule: If the payoff matrix is a prisoner's dilemma (a well-defined subset of payoff matrices), and the code ofYequals the code ofX, then cooperate. (I've called algorithms like this CliqueBots before, since they cooperate only with themselves and defect against everyone else.)Furthermore, we can extend this to cases where the source codes are different but equivalent; let

Xdo the following when given a gameGand an opponentY:G. IfGis not a prisoner's dilemma, then implement Causal Decision Theory.Gis a prisoner's dilemma, try to deduce "for all opponentsZ, outputX(codeG, 1, codeZ)=Cooperateif and only if outputY(codeG, 1, codeZ)=Cooperateif and only if outputX(codeG, 2, codeZ)=Cooperateif and only if outputY(codeG, 2, codeZ)=Cooperate".^{5}This now satisfies #1-3 of my criteria for a better decision theory (and we could add another rule so that it one-boxes on Newcomblike problems and satisfies #4), but it completely fails #5: this is a well-justified ad-hoc exception, but it's still an ad-hoc exception, and it's not that enlightening as examples go. We'll want something that achieves the same result, but for a genuine reason rather than because we've added individual common-sense patches. (One problem with patches is that they don't tell you how to solve more complicated problems- extending the analysis to three-player games, for instance.)

Furthermore, Cliquish Decision Theory misses out on a lot of potential cooperation: there are multiple strict improvements on Causal Decision Theory which cooperate with each other but don't all cooperate with the same third parties, and it's simple to show that Cliquish Decision Theory can't cooperate with any of these. So we'll need new ideas if we're going to do better.

Our next steps will be toward Timeless Decision Theory. If you've considered TDT on the philosophical level, you'll recognize the concept of substituting your (hypothetical) output for your opponent's (hypothetical) prediction of your output, and using

thoseoutcomes to decide what to do. Our setup gives us a clean general way to implement this messy idea:## Idea 4: Substitute for the Source Code of X (as an input of Y)

That is, instead of trying to deduce the output of

Y(codeG, 2, codeX) and then pair it with each actionx, we can deduce the output of_{i}Y(codeG, 2,some other code that corresponds to x) and pair it with the deduced action. I'm being vague about this, partly to generate some suspense for the next post, but mostly because the most obvious way to go about it has two major flaws..._{i}Next:Part III: Formalizing Timeless Decision Theory## Notes

0.This is not quite true for Updateless Decision Theory, which explicitly trades off worse performance in some rounds for a better average performance. But we'll get to that in due time.Also, it's possible to write an opponent

Ythat "rewards stupidity" by caring about things besidesX's outputs: it might letXattain its best outcome only ifXis provably a causal decision theorist, for instance. But in such cases,Yisn't really optimizing its own payoffs (as listed) in any meaningful sense, and so this is not a fair problem; we're tacitly assuming throughout this analysis thatXis a real attempt to optimizeUandYis a real attempt to optimizeV, and both of these depend only on the outputs.1.XandYneed to know which is player 1 in gameGand which is player 2. Also, the steps ofGdon't have to be done in that sequential order- they can be calculated in parallel or any other sensible way. (In Part III we'll introduce an interesting game which first flips a coin to decide which one is Player 1 and which one is Player 2.)2.In fact, the correct payoff matrix is necessary even to implement the Nash equilbrium strategy! I glossed over this before by assuming that, in addition to the source code,Xwas given a valid payoff matrix; but here, we see that there is a sensible way to read off the matrix from the code ofG(if the inference module ofXis up to the task). We'll assume from here on out that this is the case.3.There are other variants of this CDT algorithm: for instance,Xmight only output argmax(U(x,(outputY))) if the value is greater than the value ofX's default Nash equilibrium. This algorithm can "insist on a good result or else no deal"; for instance, in the game below, ifXtakes as default the equilibrium that nets xer 2, the new algorithm will refuse to settle for 1, when the old algorithm would often have done so.On the other hand, when playing an opponent that always picks the second column (i.e. a one-player game), this new algorithm will wind up with nothing! So there's a tradeoff involved, where other algorithms can get (in many cases) the best of both worlds.

4.Maybe the best way to implement Newcomb's Problem in this setting is to introduce the gameNas follows (Xchooses the row and gets the first listed payoff):Your opponent is the predictor

P, which is programmed as follows (when the game isN):X(codeN, codeP)=row 1".If you expect to face

NagainstP, then you have every incentive to make it an easy deduction that outputX(codeN, codeP)=1.5.There may be a better way to do this, but most simplifications will fall short of CDT in one way or another. For instance, if we just haveXtry to deduce "outputX(codeG, 1, codeY)=outputY(codeG, 2, codeX)", then it's quite possible forXto wind up cooperating with a CooperateBot, which is suboptimal performance on a one-player game. (A proof of that statement can be generated by Löb's Theorem.)