Decision Theories, Part 3.75: Hang On, I Think This Works After All

20Benya

16Paul Crowley

10Wei Dai

8Benya

3orthonormal

5orthonormal

8asparisi

7Wei Dai

2orthonormal

5Wei Dai

2orthonormal

2cousin_it

0orthonormal

4Mephistopheles

0orthonormal

0Wei Dai

5orthonormal

15Benya

4Eliezer Yudkowsky

11wedrifid

36[anonymous]

8orthonormal

2wedrifid

2orthonormal

0Decius

3orthonormal

0Decius

2orthonormal

1Decius

1orthonormal

1Decius

0Wei Dai

2orthonormal

3Wei Dai

0orthonormal

3Wei Dai

0orthonormal

3Wei Dai

0Decius

2orthonormal

0Decius

1Benya

0Decius

6orthonormal

0Decius

New Comment

This is to say that I've read through the whole thing, and so far have not found any problems other than those Wei has raised.

That is all. [...except, I find it weird that j is what Y returns -- it feels reversed. Aesthetically, I'd prefer it if Search(.,.) was passed the mask as its first argument. -- And, of course: good luck! Hope the remaining questions will work out.]

Wow, upvoted for sticking your neck out. We don't have enough incentives to say this sort of thing - it doesn't have the same kudos as saying something new, and it risks you getting some egg on your face if there's a problem you didn't spot - but it's really useful, so have my upvote.

When you say "PA+i" you must mean a finite fragment of PA+i (e.g., PA+i with a limit on proof length), otherwise I don't see how FairBot ever gets to execute the second "if" statement. But the Löbian cycle:

if we have a formal system S, and "if S proves A, then B" and "if S proves B, then A" are theorems of S, then S indeed proves both A and B.

isn't a theorem when S is a finite fragment of a formal system, because for example S may "barely" prove "if S proves A, then B" and "if S proves B, then A", and run out of proof length to prove A and B from them. I think to be rigorous you need to consider proof lengths explicitly (like cousin_it did with his proofs).

And of course, "if 'M(FB)=C' is provable in PA+3, then FB(M)=C" is provable in PA+3, since again PA+3 can prove that PA through PA+2 won't have found proofs of contrary conclusions before it gets around to trying to find cooperation in PA+3.

I'm not seeing how this goes through. Can you explain how PA+3 can prove that PA through PA+2 won't have found proofs of contrary conclusions? (If they did, that would imply that PA+3 is inconsistent, but PA+3 can't assume that PA+3 is consistent. How else can PA+3 prove this?)

I think to be rigorous you need to consider proof lengths explicitly (like cousin_it did with his proofs).

This is clearly not what orthonormal meant, but I *think* if you interpret the "provable in PA+i" function as a call to a halting oracle, all the parts that raise questions about proof lengths just go through. (I also had the impression that each step in the post should break down to a formal proof whose length is O(l), where l is the length of the expression specifying how many proofs are searched; if so, everything should work out for a large enough proof limit. I think this was orthonormal's intent. However, I haven't checked this in detail.)

Can you explain how PA+3 can prove that PA through PA+2 won't have found proofs of contrary conclusions?

Let me try something, without a high degree of confidence--

The following reasoning is in PA+3, using the halting oracle interpretation so that I do not need to worry about proof lengths. We assume that 'M(FB)=C' is provable in PA+3 and that FB(M) != C, and try to derive a contradiction.

It follows directly that FB(M)=D; given our assumption, this means that PA+2 proves 'M(FB)=D'. Since we're in PA+3, it follows that M(FB)=D. But we also know that if 'FB(M)=C' is provable in PA+3, then M(FB)=C (as stated in the proof of Lemma 8, right before the contentious argument). Thus, if M(FB)=D, then PA+3 does *not* prove 'FB(M)=C'. This means that PA+3 is consistent. But we also know (given our assumptions) that PA+3 proves both 'M(FB)=C' and 'M(FB)=D' (the latter because PA+3 proves everything that PA+2 proves); contradiction.

It took me a while to sit down and work through it, but the proof works! Thanks for patching the pothole I didn't see.

As Benja said, I'm imagining that my agents have O(1) calls to a hierarchy of halting agents. But when I write it up, I'll also write up the version with bounded proof limits (where exponentially greater limits correspond to higher oracles), the bounded Löb's Theorem, etc. I'm not too worried, because (as Paul Christiano put it) there are no interesting differences between what you can do in each case; it's simply easier to track things with oracle hierarchies.

I'm not seeing how this goes through. Can you explain how PA+3 can prove that PA through PA+2 won't have found proofs of contrary conclusions?

Hmm. This is a real problem, and I'm thinking about it today.

**ETA:** Oh, good, Benja's solution works.

While trying to understand Masquerade better, I came up with the following design, which seems to accomplish something similar, while being much simpler:

```
def OpportunisticBot(X):
- if "X(OpportunisticBot)=C" is provable in PA-1:
-- return D
- if "X(OpportunisticBot)=C" is provable in PA-2:
-- return C
- return D
```

where PA-1 is PA with a limit on proof length, and PA-2 is PA with a sufficiently larger limit such that it can prove all true statements of the form "X is not provable in PA-1" by exhaustive search. I claim the following:

- OB(DB)=OB(CB)=D
- OB(FB)=FB(OB)=C
- OB1(OB2)=OB2(OB1)=C

Proof of 3: If either "OB1(OB2)=C" or "OB2(OB1)=C" is provable in PA-1, then PA-2 must be unsound, therefore they are not provable in PA-1. PA-2 can prove that they are not provable in PA-1 by exhaustive search which implies PA-2 can prove that both OB1(OB2) and OB2(OB1) will reach the second if statement, which sets up a Loebian circle.

Can you think of a problem that the Masquerade approach solves but OB doesn't?

```
def SlowCooperateBot(X):
- if 2+2=3 is provable in PA+1:
- - return D
- else return C
```

It looks to me like OpportunisticBot cooperates with SlowCooperateBot, simply because the latter takes too long to be shown a cooperator, while Masquerade rightly defects against it.

It seems unfair to count this SlowCooperateBot against OpportunisticBot since OB only uses up to PA-2 while Masquerade uses up to PA+4. What if we change OB by replacing its PA-1 and PA-2 with PA-3 and PA-4 respectively? Then we can still create a SlowCooperateBot that OB wrongly cooperates with by changing SCB's PA+1 to PA+3. But I think we can also create a SlowFairBot that OB rightly cooperates with but Masquerade wrongly defects against (assuming N=4).

```
def SlowFairBot(X):
- if 2+2=3 is provable in PA-3:
-- return D
- if "X(SlowFairBot)=C" is provable in PA-4:
-- return C
- return D
```

Looks like the score is even? ETA: Also, I feel like talking about these "Slow" opponents is probably a distraction, and there may be some point to Masquerade that I'm not getting yet. Maybe it will be more apparent when you talk about the generalizations and bargaining problems mentioned in the OP?

Sorry for the delay.

As I was saying to cousin_it, it's instructive to think about what happens when you throw a delay like this at Masquerade: it basically has to wait it out and start from scratch.

Qualitatively, Masquerade needs to go up a few steps before reaching mutual cooperation with FairBot. That kind of delay will make it "start out" at PA+4. Notably, if your SlowFairBot delays to PA+3 but tries to prove cooperation in PA+7, it'll still work.

I'm willing to accept that sometimes you run out of time to find mutual cooperation and end up mutually defecting. The property that OpportunisticBot has that I don't like is that it sometimes "regrets it has too much power"- it would have done better on my example if it had stopped searching sooner. That's not the case for Masquerade.

**EDIT:** Actually, as Masquerade is written, it would end up with mutual defection here. But if, instead of having all the proof searches capped at the same N, the MaskSearch capped out at a lower (but still sufficiently high) level than SlowFairBot and Masquerade's sanity check, it would mutually cooperate.

Unless I've misunderstood Masquerade, there seems to be a version of SlowCooperateBot that Masquerade also cooperates with:

```
def TemptingCooperateBot(X):
- if X(TemptingCooperateBot)=C is provable in PA:
- - return C
- if 2+2=3 is provable in PA+4:
- - return D
- return C
```

If that's correct, then Wei's comment seems to be a good summary of the reason why your approach works. But maybe I'm still missing something.

After kicking around ideas for a bit, I came up with a candidate for "fair decision theory problem class" and a reasonable-seeming notion of optimality (like Pareto optimality, not necessarily unique) which things like DefectBot and the various FairBots all fail. However, Masquerade fails it too, albeit for interesting reasons.

Let's outsource all of our provability oracle needs: instead of receiving the opponent's source code, let's redefine the problem so that both agents have access to a hierarchy of halting oracles for PA+i, and have "my current opponent" as a callable variable. That is, they can ask if PA+i proves statements of the form "X(Y)=C" or "X(Y)=D", where X and Y can be the agent's self, their opponent, or any specified mask. But they can't do things like check whether the source code is identical to their own. (Why am I doing this? As Eliezer puts it when talking about Newcomb's Problem, only your *decisions* should matter to another agent, not your ritual of cognition.) Aside from those oracle calls, our agents are limited to using bounded loops.

Let's introduce the notion of "regret", an analogue of counterfactual surgery on one's decision. I imagine someone who's written a FairBot, upon seeing that it's playing a single game against a CooperateBot, regret that they couldn't have included the subroutine "if your opponent is this CooperateBot, then defect" in their program.

(Note that this subroutine is illegal under my definition of the class of agents; but it serves the purpose of analogizing "counterfactual surgery on this particular decision".)

As I've been saying, there are some forms of this regret that I'm willing to accept. One of these is regret for third-party punishment; if I find I'm playing against an agent who cooperates with me iff I cooperate with DefectBot, then I might locally wish I had the appropriate subroutine when I played this punisher, but having it would surely harm me if I later played a DefectBot. So I'll call it a regret only if I wish I could have coded a surgical subroutine that applies *only to the current opponent*, not to any third parties.

The other exception I want to make is regret for "running out of time". If X is "cooperate if PA proves Opp(X)=C, else defect" and Y is "defect if PA proves Opp(Y)=D, then cooperate if PA+1 proves Opp(Y)=C, else defect", then X and Y will mutually defect and either programmer could regret this... but instead of adding a surgical subroutine, they could fix the calamity by changing X's code to look in PA+1 rather than PA. So I'll say that a regret is an "out-of-time regret" if a better outcome could have been reached by strictly increasing indices of oracle calls in one or both agents. (Masquerade suffers from out-of-time regrets against lots of FairBots, since it takes until PA+3 to work out that mutual cooperation is the best option.)

So the kind of optimality that I want is an agent in this class, such that **its only regrets against other agents of this class are out-of-time regrets**.

DefectBot and FairBot clearly fail this. At first, I hoped that Masquerade would satisfy it, but then realized that an opponent could be exploitably stupid without Masquerade realizing it. To wit, let StupidBot cooperate with X if and only if PA+4 proves that X cooperates with (this post's version of) FairBot. Then Masquerade mutually cooperates with StupidBot, but Masquerade could have a surgical subroutine to defect against StupidBot and it would get away with it.

I'm wondering, though, if adding in a bit of "playing chicken with your decision process" would not only patch this hole, but satisfy my optimality condition...

**ETA:** I'm not sure that anything will turn out to satisfy this criterion, but it seems plausible. However, if we pass from the Prisoner's Dilemma to games with a bargaining component, I'd expect that nothing satisfies it: there's just too much room for one-upsmanship there. Hopefully there will be *some* weak generalization (or some emendation) that's meaningful.

@orthonormal: Thanks for pointing me to these posts during our dinner last Thursday.

I haven't checked the proofs in detail, and I didn't read all previous posts in the series. I'm a bit confused about a couple of things, and I'd be happy if someone could enlighten me.

1) **What is the purpose of the i-loops?**

In your definition of FairBot and others, you check for all i=1,...,N whether some sentence is provable in PA+i. However, if a sentence is provable in PA+i, then it is also provable in PA+(i+1). For this reason, I don't understand why you loop over i=1,...,N.

2) FairBot and the search procedure can be **simplified**

The following holds for all agents X and Y, and all j,k in {C,D}:

- Search(X,Y) fails if and only if, for all j,k in {C,D}, the sentence "j = X(Y) and k=Y(X)" is not provable in PA+N.
- If Search(X,Y) does not fail, then Search(X,Y) outputs ( X(Y) , Y(X) ).

This means you could define the search procedure simply as follows:

Search'(X,Y) := ( X(Y) , Y(X) )

This procedure Search' has the same output as Search whenever Search does not fail, and Search' is meaningful on strictly more inputs. In particular, Search' halts and outputs the correct answer if and only if X halts on input Y and Y halts on input X.

Then you can redefine Search(X,Y) equivalently as follows:

- If "X(Y) halts and Y(X) halts" is provable in PA+N then return ( X(Y) , Y(X) );
- Otherwise, return False.

Formulated this way, it becomes apparent that you are trying to find proofs for the fact that certain Turing machines halt on certain inputs; the actual output is not so important and can be computed outside the proof system.

Similarly, FairBot(X) can be simplified as follows:

- If "X(FairBot) halts" is provable in PA+N, then return X(FairBot).
- Otherwise return D.

This nicely matches the intuition that we try to understand our opponent in interaction with us, and if we do, we can predict what they will do and match their action; if we fail to understand them, we are clueless and have to defect.

3) On a related note, **what is the utility if we don't halt?**

Maybe your concern is that we should halt even if we play against the Turing machine that never halts? To me it seems that a prisoner who can't decide on what to do effectively outputs "Cooperate" since the guards will only wait a finite amount of time and don't extract any information from a prisoner that does not halt to explicitly output D.

In the model in which not halting is the same as cooperating, the trivial bot below is optimally fair:

TrivialFairBot(X):

- return X(TrivialFairBot). (i.e., simply simulate your opponent and match their action)

Based on the fact that you're doing something more complicated, I assume that you want to forbid not halting; let's say we model this such that every non-halting player gets a utility of -∞. Then it is a bit weird that we have to halt while our opponent is allowed to run into an infinite loop: it seems to give malicious opponents an unfair advantage if they're allowed to threaten to go into an infinite loop.

4) **Why allow these PA oracles?**

I don't understand why you forbid your agents to run into an infinite loop, but at the same time allow them have access to uncomputable PA+N oracles. Or are you only looking at proofs of a certain bounded length? If so, what's the motivation for that?

5) **How do you define "regrets" formally?**

I haven't checked this, but it seems that, by diagonalization arguments, every possible agent will loose against infinitely many opponents. It further seems that, for any given agent X, you can always find an opponent Y so that adding a hard-wired rule "if your opponent is this particular agent Y, then do C/D" (whichever is better) improves the performance of X (note that this can't be done for all Y since Y could first check whether X has a hard-wired rule like that). In particular, I can imagine that there is an infinite sequence X_1,.. of agents, such that X_{i+1} performs at least as good as X_i on all inputs, and it performs better on at least one input. This would mean that there is no "Pareto-optimum" since there's always some input on which our performance can be improved.

6) **What's the motivation for studying a decision theory that is
inherently inefficient, almost not computable, and easily exploitable
by some opponents?**

If we're trying to model a real decision theory, the two agents arguably shouldn't have access to a lot of time. Would the following resource-bounded version make sense? For some function t(n), the machines are forced to halt and produce their output within t(n) computation steps, where n=|X|+|Y| is the length of the binary encoding of the machines. In this post, you're getting decision theories that are at least exponential in n. I'd be interested to see whether we achieve something when t(n) is a polynomial?

The time-bounded setting has some advantages: I think it'd more realistically model the problem, and it prevents an arms race of power since no player is allowed to take more than t(n) time. On the other hand, things get more difficult (and more interesting) now since you can inspect your opponent's source code, but you don't have enough time to fully simulate your opponent.

Thanks for the reply, and sorry for the delay...

Masquerade correctly defects against TemptingCooperateBot, for the following reasons:

- PA proves "FB(TCB)=TCB(FB)=C"
- PA+5 proves "TCB(DB)=C" and of course PA proves "DB(TCB)=D"
- PA+5 proves "TCB(Masquerade)=C" (though this is actually irrelevant, as Masquerade will defect at this point whether the sanity check succeeds or fails)

It's not that there's anything magical about PA+4 in Masquerade; my mental image is that Masquerade uses a large value for N, so that it has time to find mutual cooperation with all but the very loopiest of opponents. If you delay it until PA+4, then it'll just start from there, more or less.

It seems possible to generalize this OpportunisticBot idea and port it to the UDT framework (where "opponents" and "the rest of the environment" are not provided separately). I'm not sure if this leads anywhere interesting, but I better write it down before I forget.

Define UDT-OB(input) as follows: (Define "U", "us", "actions", and "mappings" as in Benja's recent post.) For each u in us in descending order, for each m in mappings, check if "UDT-OB(i) == m[i] for every i in actions.keys() AND U() >= u" is provable in PA-j, with j incrementing by 1 each time. If it is provable in PA-j, then return m[input].

This works in simple cases where "UDT-OB(i) == m[i] for every i in actions.keys() IMPLIES U() >= u" is easily provable for optimal m* and u*, because assuming PA is sound, UDT-OB will reach the step with m=m* and u=u*, and since PA-j can prove that it reaches this step, it can (using the above IMPLIES statement) prove 'if "UDT-OB(i) == m[i] for every i in actions.keys() AND U() >= u" is provable in PA-j then UDT-OB(i) == m[i] for every i in actions.keys() AND U() >= u'.

This also works for *some* cases where Löbian circles are needed, provided that when each agent tries to prove the necessary statements, they are using proof systems that are strong enough to prove that both agents reach that step. I'm not sure how to arrange this in general.

Proof that a Löbian cycle works out (hat tip, Eliezer and co.):

|- □A -> B

|- □B -> A

then

|- □(□A->B)

|- □□A->□B

|- □□A->A

|- □(□□A->A)

|- □□□A->□A

|- □A->□□A

|- □□□A->□□A

(apply Löb)

|- □□A

|- A

Yeah, it's the syntax "□A" for "there is a Gödel-numbered proof of A", and "|-" for "the following is provable".

Or is your browser showing a lot of junk instead of things that cash out to "it is provable that (a Gödel-numbered proof of A) implies B"?

Or is your browser showing a lot of junk instead of things that cash out to "it is provable that (a Gödel-numbered proof of A) implies B"?

Assuming the only operators you used were , "|-" and "->" it all came through.

Note the key step,

|- □A->□□A

and why it's true even though

|- A->□A

is false: the formal system can see how to take a Gödel-numbered proof and replace all of the tokens with more complicated ones to get a Gödel-numbered proof of a Gödel-numbered proof, but it has nothing to work with if it starts by assuming just an object-level result.

What is special about A being provable that makes |- □A -> B provable, given only that it is true? For example, assume some A and B such that:

|- □A

|- B

|- □A -> B is trivial, as is |- □B -> A. (it follows from |-□A->A)

but |- □B is false.

Edit: changed some variables to be more clear.

Uh, I don't understand you. |- □A->□□A is a very special case.

it follows from |-□A->A

That statement isn't true for general A!

□A->A, (true statement, assuming a sound system)

but not

|- (□A->A)

□(□A->A) (confirm these are false statements)

but

|- □□A->A

and

|- □A-> □□A (these are statements you have made)

|- □□A->A follows from the assumptions |- □A -> B and |- □B -> A; it's not a theorem in general.

Does it apply to (All A for which exist a B such that |- □A -> B and |- □B -> A)?

Is it possible to construct a B in general that meets those descriptions for an arbitrary statement? For example, B=" (a proof that a numbered proof of A implies B) and (a proof of a numbered proof of B implies A)" (B="( |- □A -> B) and ( |- □B -> A) ), or is a self-referential statement prohibited in sound systems?

Does B even have to be true, much less provable?

To the first question, yes. That's what the top comment in this sequence proved (and Benja's reply proved much more elegantly).

To the second question, self-referential statements can't be ruled out in sufficiently powerful formal systems- that's what Gödel's Incompleteness Theorem uses!

To the third, again, if any formal system proves □A -> B and □B -> A, then it proves A and B.

Consider the statement A "□A -> B and □B -> A" and the statement B "□A -> B and □B -> A and false"

Why are those two statements not provable in every system in which they can be written?

given a payoff matrix in general, Masquerade enumerates all of the constant strategies and all of the "mutually beneficial deals" of the FairBot form

But here, you *didn't* enumerates all of the constant strategies. To do that, we would change "for Y in [DefectBot, FairBot]" to "for Y in [DefectBot, CooperateBot, FairBot]". Then Masquerade(CB)=D and Masquerade(Masquerade)=D because when it gets to "if utility > U(D,D)" it will be the case that my_move=D and their_move=C. Correct?

It's the other way around: if we include CooperateBot among the options, Masquerade looks at playing CooperateBot against the other Masquerade, and sees that this would be the worst of its options. It sticks with FairBot.

The only reason I didn't include CooperateBot in the proofs above is that it would make me write several more easy lemmas (and make everyone read them).

Ah, ok. Another question: what happens if the other Masquerade is using a different proof system, like ZFC for example?

That's a really interesting question; I don't know. At the very least, what's needed is for both agents to be using hierarchies of proof attempts such that the later attempts of one can answer questions about the early attempts of the other, and vice versa. Maybe some Löbian cycle still works if this is the case, or maybe they really do need to be using equivalent deductive systems at some point.

Let me ask a simpler question. Does the following FB1 cooperate with FB2?

```
def FB1(X):
- if "X(FB1)=C" is provable in PA+1:
- - return C
- else return D
```

where FB2 is FB1 with "PA+2" in place of "PA+1". I think the answer is yes, because PA+1 can prove

- "FB1(FB2)=C" is provable in PA+1 implies "FB1(FB2)=C" is provable in PA+2
- "FB1(FB2)=C" is provable in PA+1 implies FB2(FB1)=C (using 1 and looking at FB2)
- "FB2(FB1)=C" is provable in PA+1 implies FB1(FB2)=C (looking at FB1)
- FB2(FB1)=C (Löbian cycle)

Does this look right? And I think FB1 would also cooperate with FB-ZFC (FB1 with ZFC in place of PA+1), because PA+1 can prove that ZFC proves a superset of theorems of PA+1?

I think that's right, yes. If one proof system encompasses the other, it looks like it works.

Another question is, what if we had two proof systems, neither of which encompasses the other? (Most horrifyingly, what if one were PA plus the Gödel sentence of PA, and the other were PA plus the negation of the Gödel sentence?)

Another question is, what if we had two proof systems, neither of which encompasses the other? (Most horrifyingly, what if one were PA plus the Gödel sentence of PA, and the other were PA plus the negation of the Gödel sentence?)

I think that works too. Call these proof systems PA+ and PA- and the FairBots using them FB+ and FB-. PA can prove that any theorem of PA is also a theorem of PA+ and PA-. So PA can prove:

- "FB+(FB-)=C" is provable in PA implies "FB+(FB-)=C" is provable in PA-
- "FB+(FB-)=C" is provable in PA implies FB-(FB+)=C (using 1 and looking at FB-)

and similarly for the other half of the Löbian circle. So FB+(FB-)=C and FB-(FB+)=C are theorems of PA and hence theorems of PA- and PA+.

If M(M)=C is provable in PA+i, then M(M) returns D. if M(M)=D is provable in PA+i, then M(M) returns D. If M(M) is not provable in PA+i, then M(M) returns D.

Search(M,M), returns false, because M depends on the result of Search(M,M) and therefore no proof can exist. M(M) returns D, every time.

Search(M,M), returns false

Search is never called on (M,M), only on (M,FB) and (M,DB). (And of course all combinations of FB and DB.)

If M(M)=C is provable in PA+i, then M(M) returns D.

Which proof is broken, and why? This is a case where intuiting what the agent will do, without proving it, is unreliable; it's not very feasible for humans to *consciously* grok the ramifications of a depth-five stack of "I know that they know that I know..."

Lemma 8: "Masquerade(FB)=C" and "FB(Masquerade)=C" are provable in PA+3.

And your proof of that claim is simply proof that "Masquerade(FB)=D" and "FB(Masquerade)=D" are not true in PA+3.

Proof: Lemmas 5 and 6, and the consistency of PA+2, imply that when Masquerade arrives at the sanity-check stage, it has the variables set as utility=U(C,C), my_move=C and their_move=C. Thus PA+3 can prove that "if 'FB(M)=C' is provable in PA+3, then M(FB)=C". And of course, "if 'M(FB)=C' is provable in PA+3, then FB(M)=C" is provable in PA+3, since again PA+3 can prove that PA through PA+2 won't have found proofs of contrary conclusions before it gets around to trying to find cooperation in PA+3. Therefore we have the desired Löbian cycle!

However, "if 'FB(M)=**D**' is provable in PA+3, then M(FB)=**D**". And of course, "if 'M(FB)=**D**' is provable in PA+3, then FB(M)=**D**" is provable in PA+3...
For PA+3 to be consistent, then it cannot prove that FB(M)=D and FB(M)=C. Since it cannot prove one but not the other, and the proofs are identical, it cannot prove either.

However, "if 'FB(M)=D' is provable in PA+3, then M(FB)=D".

Are you claiming that this is provable in PA+3? How?

As the proof of Lemma 8 says, by the time we reach the sanity-check stage, we have my_move == their_move == C, and PA+3 knows this. Thus, *in* the sanity-check stage, Masquerade looks for a proof of 'FB(M)=C' in *any* of the PA+i (i<N), and if it finds one, it returns C. Thus, to show that M(FB)=D, PA+3 would need to conclude that "PA+3 does NOT prove 'FB(M)=C' -- and neither does PA+4, ... PA+(N-1)." I don't see how it's supposed to do that?

And of course, "if 'M(FB)=D' is provable in PA+3, then FB(M)=D" is provable in PA+3...

I don't see this one, either. FairBot looks for proofs of 'M(FB)=C' in PA, PA+1, PA+2 *and* PA+3 before it gets around to looking for proofs of 'M(FB)=D' in PA+3. Therefore, to conclude "FB(M)=D" from "PA+3 proves 'M(FB)=C'", it would seem that PA+3 must first conclude that "'M(FB)=C' is NOT provable in PA+3"; and again, I do not see how it would do that.

Oh, I see the premise now.

You aren't trying to develop a new strategy, you are trying to select from one or more already existing strategies based on which one is optimal. You do this by finding where the biggest proven payoff is and then emulating the behavior among your potential strategies that results in the highest payoff. It doesn't generalize well because you ask PA+i what your partner will do against each of the other potential strategies, but only emulate those strategies if PA+i proves that they also would do that against you. In cases where the complexity of the other decision maker were significant, Masquerade would always fall back on its default- try writing a few RPS programs, that need to throw Rock, Paper Scissors against each other and have access to each other's source code but not random numbers. Payoffs for a win are 3-0, for a tie 2-2.

M(M)=C IFF PA+i proves that M(M)=C. PA+i proves that M(M)=C IFF (PA proves M(M)=C or PA+i proves M(M)≠D)

PA+i proves M(M)≠D IFF (PA proves M(M)≠D or PA+i proves M(M)=C)

M(M)=C IFF PA proves that M(M)=C or PA proves M(M)≠D.

I'm probably missing some magic property of PA+i in that simplification, but otherwise it's the last three lines of your program, given that the sanity check is checking for C,C.

Thank you. You have finally proven to me that Lob's theorem is as absurd as it sounds on the face of it. If PA+i can prove "If

You have finally proven to me that Lob's theorem is as absurd as it sounds on the face of it.

When math seems absurd to you, this is generally a signal that you've got something interesting to learn, rather than that you've got a superior epistemic vantage point to mathematics.

No, seriously, this stuff is awesome and you should check it out. I'm realizing that I need a better grasp of it in order to write it up, so I'm checking out The Logic of Provability on Monday.

Oh, I agree that it's true in mathematics. I just recognize that things which are proven within a strict construction of mathematics might not apply outside that strict interpretation.

It's the intersection between Godel and Lob's theorems that really throws me off- anything which is not false in PA can be proven in PA, including nonce statements, but there are things that are true in PA which cannot be proven in PA, and there is a mechanism for generating such statements.

Consider the statement G: "G cannot be proven in PA". G is proven in PA+1, because if PA were to have a proof of G, then G would be false and PA would prove something which is false. So far, so good- but PA proves that PA+1 proves G; since the only axiom in PA+1 not in PA is "PA is sound", PA proves that "PA is sound->G is true".

If PA is not sound, then PA proves G is true. (trivial, since an unsound system proves everything)

PA is either (sound or not sound).

PA proves (if PA is sound, G is true.)

PA proves that (if PA is not sound, PA proves that G is true)

(PA proves that G is true) within PA implies G is true.

PA proves that G is true If PA is (sound or unsound).

PA proves that G is true.

PA proves that PA is not sound.

I'm not sure if I'm literally only using the axiom "S is either sound or not sound" to prove "S proves that S is not sound". I'm sure that the principle of explosion requires a few axioms more, and that I've made a (common?) fatal error somewhere, but I'm comfortable with calling mathematical constructions "Absurd" without calling them false.

The best response to "X is absurd" is "Reality is absurd", a point with which I will readily agree.

EDIT: Corrected a couple of major typos resulting in GCE, other errors are probably still present.

Followup to:Decision Theories, Part 3.5: Halt, Melt and Catch Fire, Decision Theories: A Semi-Formal Analysis, Part IIIThe thing about dead mathematical proofs is that it's practically always worth looting the corpse; sometimes you even find there's still a pulse! That appears to be the case with my recent post lamenting the demise of the TDT-like "Masquerade" algorithm. I think I've got a rigorous proof this time, but I'd like other opinions before I declare that the rumors of Masquerade's demise were greatly exaggerated...

To recap quickly, I've been trying to construct an algorithm that, given the payoff matrix of a game and the source code of its opponent, does some deductions and then outputs a move. I want this algorithm to do the commonsense right things (defect against both DefectBot and CooperateBot, and mutually cooperate against both FairBot and itself), and I want it to do so for simple and general reasons (that is, no gerrymandering of actions against particular opponents, and in particular no fair trying to "recognize itself", since there can be variants of any algorithm that are functionally identical but not provably so within either's powers of deduction). I'd also like it to be "un-exploitable" in a certain sense: it has a default move (which is one of its Nash equilibria), and no opponent can

profitagainst the algorithm by forcing it below that default payoff. If the opponent does as well or better in expected value than it would by playing that Nash equilibrium, then so too does my algorithm.The revised Masquerade algorithm does indeed have these properties.

In essence, there are two emendations that I needed: firstly, since some possible pairs of masks (like FairBot and AntiFairBot) can't knowingly settle on a fixed point, there's no way to determine what they do without a deductive capacity that strictly exceeds the weaker of them. That's a bad feature to have, so we'll just have to exclude potentially troublemaking masks from Masquerade's analysis. (In the special case of Prisoner's Dilemma I know that including DefectBot and FairBot will suffice; I've got what looks like a good solution in general, as well.)

The second emendation is that FairBot needs to alternate between trying harder to prove its opponent cooperates, and trying harder to prove its opponent defects. (There needs to be an asymmetry, like cooperation proofs going first, to guarantee that when FairBot plays against itself, it finds a Löbian proof of mutual cooperation rather than one of mutual defection.) The reason for this is so that when agents reason about masks, they should be able to find a proof of the mask's action

withoutneeding to exceed that mask's powers of deduction. Otherwise we get that arms race again.This escalation of proof attempts can be represented in terms of proof limits (since there exists a constant C such that for N sufficiently large, a proof that "there are no proofs of X of length less than N" either exists with length less than C^N or not at all), but the simplest way to do this is with the formalism of PA+N. That is, PA is Peano Arithmetic; PA+1 is the formal system with the axioms of Peano Arithmetic and an extra axiom that PA is self-consistent (that is, if PA proves X, then PA does not prove not-X); PA+2 has those axioms and an extra one stating that PA+1 is self-consistent and so on. (Note that none of these formal systems know themselves to be self-consistent, and for good reason!) In every use, we'll assume that N is a fixed number (anything greater than 4 will work).

## New And Improved Masks

So without further ado, let's define our masks for the Prisoner's Dilemma:

def DefectBot(X):

return D

def FairBot(X):

for i in range(N):

if "X(FairBot)=C" is provable in PA+i:

return C

if "X(FairBot)=D" is provable in PA+i:

return D

return D

Lemma 1:For any X, "DefectBot(X)=D" is provable in PA. (Obvious.)Lemma 2:"FairBot(FairBot)=C" is provable in PA.Proof:If we label the FairBots separately (after all, they might be logically equivalent but not know they are), we can quickly show that "if PA proves FairBot1(FairBot2)=C, then FairBot2(FairBot1)=C" is provable in PA (and the same with the subscripts reversed). This makes what we might term a Löbian cycle, which will come up a few more times in this post: if we have a formal system S, and "if S proves A, then B" and "if S proves B, then A" are theorems of S, then S indeed proves both A and B. (Proof reserved for a comment; hat tip to Eliezer and co.)Lemma 3:"FairBot(DefectBot)=D" is provable in PA+1.Proof:Note that this doesnotwork in PA, even though PA can prove "DefectBot(FairBot)=D". Why not? Because PA can't prove that PA doesn't find a proof of "DefectBot(FairBot)=C" before it can look for a proof of defection! However, PA+1 can handle this just fine, because it proves "if PA proves DB(FB)=D, then PA does not prove DB(FB)=C".Note the weird fact that it takes a stronger formal system to prove a result that's much more intuitive from outside the system!

## New and Improved Masquerade

Let's split up the long definition of Masquerade with a helper function.

def Search(X,Y):

for i in range(N):

for j,k in [C,D]:

if "Y(X)=j and X(Y)=k" is provable in PA+i:

return (j,k)

return False

This function, given two agents, simply searches (using increasingly strong formal systems) for a proof of what they do against each other; once it finds one, it halts and returns that result, and if it doesn't find one in time, it announces its failure. We will incorporate this into the definition of Masquerade. (The other function call here is to the payoff matrix, to retrieve the utility U(j,k) of a certain pair of moves.)

def Masquerade(X):

utility = -∞

my_move = null

their_move = null

for Y in [DefectBot, FairBot]:

if Search(X,Y) != False:

(j,k) = Search(X,Y)

if U(j,k) > utility:

my_move = j

their_move = k

utility = U(j,k)

if utility > U(D,D):

for i in range(N):

if "X(Masquerade) = their_move" is provable in PA+i:

return my_move

return D

A few things to note: first, this agent actually halts and outputs an action against any opponent... but for it to provably do so in a system below PA+N, Search(X,Y) has to find proofs quickly. This is the reason that my prior attempt didn't work- it had to wait at one point for the old FairBot to run out of time/power before concluding what it did, and that made it impossible for the old FairBot to know what the old Masquerade did. But with the new and improved agents, we get to ground in a fixed number of steps.

For brevity, I'll label DefectBot, FairBot, and Masquerade as DB, FB, and M, respectively.

Lemma 4:"Search(DB,DB)=(D,D)" is provable in PA+1. (Follows from Lemma 1; note that it needs to use PA+1 in order to rule out finding proofs of other action-pairs.)Lemma 5:"Search(FB,DB)=Search(DB,FB)=(D,D)" is provable in PA+2. (Follows from Lemma 3.)Lemma 6:"Search(FB,FB)=(C,C)" is provable in PA. (Follows from Lemma 2; since (C,C) is the first one tried, we don't even need to go up to PA+1.)Lemma 7:"Masquerade(DB)=D" is provable in PA+3.Proof:Lemmas 4 and 5, plus the fact that PA+3 knows the consistency of PA+2. There is no sanity-check step, since utility=U(D,D) here.Lemma 8:"Masquerade(FB)=C" and "FB(Masquerade)=C" are provable in PA+3.Proof:Lemmas 5 and 6, and the consistency of PA+2, imply that when Masquerade arrives at the sanity-check stage, it has the variables set as utility=U(C,C), my_move=C and their_move=C. Thus PA+3 can prove that "if 'FB(M)=C' is provable in PA+3, then M(FB)=C". And of course, "if 'M(FB)=C' is provable in PA+3, then FB(M)=C" is provable in PA+3, since again PA+3 can prove that PA through PA+2 won't have found proofs of contrary conclusions before it gets around to trying to find cooperation in PA+3. Therefore we have the desired Löbian cycle!Theorem:"Masquerade(Masquerade)=C" is provable in PA+4.Proof:Lemmas 7 and 8, and the consistency of PA+3, allow PA+4 to prove that when each Masquerade arrives at the sanity-check stage, it has set utility=U(C,C), my_move=C and their_move=C. Thus we achieve the Löbian cycle, and find proofs of mutual cooperation!## Awesome! So, what next?

Well, assuming that I haven't made a mistake in one of my proofs, I'm going to run the same proof for my generalization: given a payoff matrix in general, Masquerade enumerates all of the constant strategies and all of the "mutually beneficial deals" of the FairBot form (that is, masks that hold out the "stick" of a particular Nash equilibrium and the "carrot" of another spot on the payoff matrix which is superior to the "stick" for both players). Then it alternates (at escalating PA+n levels) between trying to prove the various good deals that the opponent could agree to. There are interesting complexities here (and an idea of what bargaining problems might involve).

Secondly, I want to see if there's a good way of stating the general problem that Masquerade solves, something better than "it agrees with commonsense decision theory". The analogy here (and I know it's a fatuous one, but bear with me) is that I've come up with a Universal Turing Machine but not yet the Church-Turing Thesis. And that's unsatisfying.

But before anything else... I want to be

really surethat I haven't made a critical error somewhere, especially given my false start (and false halt) in the past. So if you spot a lacuna, let me know!