Decision Theories: A Semi-Formal Analysis, Part II

[-]Jobst Heitzig3y30

I just stumbled upon this and noticed that a real-world mechanism for international climate policy cooperation that I recently suggested in this paper can be interpreted as a special case of your (G,X,Y) framework.

Assume a fixed game G where

each player's action space is the nonnegative reals,
U(x,y) is weakly decreasing in x and weakly increasing in y.
V(x,y) is weakly decreasing in y and weakly increasing in x.

(Many public goods games, such as the Prisoners' Dilemma, have such a structure)

Let's call an object a Conditional Commitment Function (CCF) iff it is a bounded, continuous, and weakly increasing function from the nonnegative reals into the nonnegative reals. (Intended interpretation of a CCF C: If opponent agrees to do y, I agree to do any x that has x <= C(y))

Now consider programs of the following kind:

    C = <some CCF> 
    if code(opponent) equals code(myself) except that C is replaced by some CCF D:
    	output the largest x >= 0 for which there is a y <= D(x) with x <= C(y)
    else: 
    	output 0

Let's denote this program Z(C) , where C is the CCF occurring in line 1 of the program. Finally, let's consider the meta-game where two programmers A and B, knowing G, each simultaneously choose a C and submit the program Z(C), the two programs are executed once to determine actions (x,y), A gets U(x,y) and B gets V(x,y).

(In the real world, the "programmers" could be the two parliaments of two countries that pass two binding laws (the "programs"), and the actions could be domestic levels of greenhouse gas emissions reductions.)

In our paper, we prove that the outcomes that will result from the strong Nash equilibria of this meta-game are exactly the Pareto-optimal outcomes (x,y) that both programmers prefer to the outcome (0,0).

(In an N (instead of 2) player context, the outcomes of strong Nash equilibria are exactly the ones from a certain version of the underlying base game's core, a subset of the Pareto frontier that might however be empty).

I'd be interested in learning whether you think this is an interesting application context to explore the theories you discuss.

[-]duwease14y20

Good article, but as always a concrete example would be beneficial to comprehension.

[-]orthonormal14y00

I'm sorry, duwease, but I'm afraid I couldn't do that.

More seriously, decision theories deserve a more intuitive writeup as well, but first someone needs to show that the content is actually there. So I'm focusing on that with this sequence.

[-]Dmytry14y20

You forget that substitution can also be applied to instances of source code of X inside the agent Y , permitting to e.g. one box on Newcomb's if you know omega got your source code, substitute a value for your code's output, and then maximize the payoff on the equation involving a substituted-in variable, by solving for this variable. That approach is not so self fulfilling because you don't assume the particular value you try equals output of your source code (which easily creates contradiction).

edit: nevermind, I see you are leaving it for the next post. Looking forwards to the next post. It'd be interesting what are those two enormous gaping flaws in the method used for much any sort of engineering; a good example would be engineering an AI that aims a gun.

[-]orthonormal14y20

It'd be interesting what are those two enormous gaping flaws in the method used for much any sort of engineering; a good example would be engineering an AI that aims a gun.

They're not going to be flaws in most contexts, just as an automated theorem-prover doesn't need to worry about spurious deductions; it's only when you have the circularity of a program whose output might depend on its output that you need to beware this kind of thing.

Also, you're probably capable of working out the first attempt at TDT and its flaws in this context, if you want to try your hand at this sort of problem. I'm splitting the post here not because it's an especially difficult cliffhanger, but because readers' eyes glaze over when a post starts getting too long.

[-]Dmytry14y00

it's only when you have the circularity of a program whose output might depend on its output that you need to beware this kind of thing.

Well, the substitutions are specifically to turn a circularity into a case of having x on both sides of some equation. We might be talking about different things. The failure mode is benign; you arrive at x=x .

edit: ahh, another thing. If you have source of randomness, you need to consider the solution with, and without, the substitution, as you can make substitution invalid by employing the random number generator. The substitution of the nonrandom part of strategy can still be useful though. Maybe that's what you had in mind?

[-]orthonormal14y20

If you have source of randomness, you need to consider the solution with, and without, the substitution, as you can make substitution invalid by employing the random number generator.

Err, I'm not sure what you mean here. In the CDT algorithm, if it deduces that Y employs a particular mixed strategy, then it can calculate the expected value of each action against that mixed strategy.

(For complete simplicity, though, starting next post I'm going to assume that there's at least one pure Nash equilibrium option in G. If it doesn't start with one, we can treat a mixed equilibrium as x{n+1} and y{m+1}, and fill in the new row and column of the matrix with the right expected values.)

[-]orthonormal14y20

I mean this sort of circularity:

The calculator computes "What is 2 + 3?", not "What does this calculator compute as the result of 2 + 3?" The answer to the former question is 5, but if the calculator were to ask the latter question instead, the result could self-consistently be anything at all! If the calculator returned 42, then indeed, "What does this calculator compute as the result of 2 + 3?" would in fact be 42.

I agree that some forms are benign. The Naive Decision Theory post and cousin_it's followup illustrate a malignant form.

[-]Dmytry14y-20

That's why you don't let your calculator be sentient. FAI might give a number that makes you most happy, which might well be 42 if you are not relying on this number for anything useful. (E.g. it might tell 42 as a joke, knowing that you know what 2+3 is)

Edit: you can, however, have some requirements on the calculator's output, and then there will be the number that satisfies those criteria; the x substitution will work to solve for this value, and in principle even to solve for protective measures to take against cosmic rays, and so on.

edit: and on the NDT, it doesn't one-way substitute at start. It assumes equivalence.

[-]TheOtherDave14y30

FAI might give a number that makes you most happy

Sure, if it happened to be in a situation where the most valuable thing to do by my standards was make me happy. Agreed.
You seem to be implying that I should prefer to avoid this result... do you in fact believe that?
If so, can you clarify why?

A somewhat analogous real-world situation: one of Siri's possible responses to "open the pod bay doors" as a command is "We're never going to live that down, are we?" This delights me enormously, and costs nothing of consequence. Should I prefer that this result be eliminated?

[-]Dmytry14y00

Actually I misunderstood his point with calculator. He was speaking of NDT with issues resulting from equivalence, i thought he was speaking of issues resulting from substitution. I did not mean to imply that you should avoid this result, simply that if you want your calculator to work by the decision theory I thought he was referring to, it got to have some utilities associated with outputs. And this doesn't really help make a calculating device.

[-]TheOtherDave14y00

Gotcha. Thanks for clarifying.

[-]orthonormal14y20

Who said anything about sentience? NDT, as described, is a perfectly comprehensible program that (in certain games that you or I would regard as fair tests) generates spurious counterfactuals and thus makes terrible decisions, thanks to a particular kind of circularity.

In this sequence, I'm not talking about FAI or anything beyond my current understanding, and I'm not intentionally drawing metaphors. I'm simply outlining programs which (if I could write a good automated theorem-prover) I could write myself, and comparing how they do in a straightforward tournament setting, with the twist of allowing read-access to source codes. We should be able to agree on that base level.

[-]Dmytry14y20

Yea, NDT is no good, agreed about that. That doesn't so much results from substitution as from full blown two way equivalence.

[-]Jonathan_Graehl14y10

It seems the point of all this is to study the properties of programs that "cooperate" on one-shot games with known payoffs. So, there's no point dwelling on objections as to the practicality of any of this (how to prove that an embodied computer actually implements given source code, etc. - see Ken Thompson's C compiler/login trojan)

I'd appreciate more precision on "If G is a prisoner's dilemma".

Also, I suppose there's nothing prohibiting mixed strategies (choosing a probability distribution over your likely moves, from which the actual move is drawn) - e.g. the machine the programs run on has access to random bits.

Since I didn't actually follow your definitions carefully enough to answer, can you tell me what would be played by two identical rational CDT, Cliquish-DT, or TDT agents (or same-clique-agents) on the game:

(2,-1) (0,0)
(0,0)  (0,0)

( with (row player, column player) payoffs )

Row player will always choose row 1 (if he wants to maximize the sum of payoffs, or his own payoff). What will column player choose? Take the -1 for the good of the row player?

That is, I'm asking whether programs which self-sacrifice when faced with another member of their clique are supposed to be 'rational' under one of these theories.

If it weren't a one-shot game, I'd wonder about history - if an external force assigns a particular instance only to the sacrificial role, does it keep on sacrificing? I can only ask this informally, unfortunately.

[-]orthonormal14y30

I'd appreciate more precision on "If G is a prisoner's dilemma".

We can read off the payoff matrix. G is a Prisoner's Dilemma if for each player, utility(I defect, they cooperate)>utility(we both cooperate)>utility(we both defect)>utility(I cooperate, they defect).

[-]Jonathan_Graehl14y00

That seems clear. Perhaps in addition, u(c,d)+u(d,c) < 2*u(c,c), though I'm not sure if that makes a difference to your claims.

[-]orthonormal14y00

Again, no difference for CDT or TDT (or the version of ADT I'll present), but sometimes it matters for UDT.

[-]orthonormal14y20

CDT and TDT would both choose column 2 in your example. CliqueBots are dependent on ad-hoc rules, so you could program them either way. There are circumstances where UDT would play column 1 against itself; we'll get to that in a bit.

[-]patrickscottshields13y00

I'd like to cite this article (or related published work) in a research project paper I'm writing which includes application of an expected utility-maximizing algorithm to a version of the prisoner's dilemma. Do you have anything more cite-able than this article's URL and your LW username? I didn't see anything in your profile which could point me towards your real name and anything you might have published.

[+]Rhwawn14y-200

(100,1)	(0,0)
(101,0)	(1,1)

LESSWRONG
LW

LESSWRONG
LW

26

Decision Theories: A Semi-Formal Analysis, Part II

26

26

Or: Causal Decision Theory and Substitution

Idea 3: Substitute to Avoid Self-Fulfilling Prophecies

Causal Decision Theory: No Longer Naive

Cliquish Decision Theory: Better, but Not Good Enough

Idea 4: Substitute for the Source Code of X (as an input of Y)

Notes

(2,1)	(0,0)
(0,0)	(1,2)