# 22

I'm writing up some math results developed on LW as a paper with the tentative title "Self-referential decision algorithms". Something interesting came up while I was cleaning up the Loebian cooperation result. Namely, how do we say precisely that Loebian cooperation is stable under minor syntactic changes? After all, if we define a "minor change" to program A as a change that preserves A's behavior against any program B, then quining cooperation is just as stable under such "minor changes" by definition. Digging down this rabbit hole, I seem to have found a nice new reformulation of the whole thing.

I will post some sections of my current draft in the comments to this post. Eventually this material is meant to become an academic paper (hopefully), so any comments on math mistakes, notation or tone would be much appreciated! And yeah, I have no clue about academic writing, so you're welcome to tell me that too.

Mentioned in
New Comment

## 1 Quining cooperation

In this section we will describe a variant of the Prisoner’s Dilemma where the players can achieve mutual cooperation whenever their decision processes are symmetric.

Let G be the following game: player 1 writes a computer program A that accepts the source code of another program B as an argument and returns either the value C (cooperate) or D (defect). At the same time, player 2 writes another such computer program B. The payoffs to players are determined by computing the values A(B) and B(A) (that is, passing each program’s source code to the other program) and interpreting these values as moves in a classical Prisoner’s Dilemma. For example, if A(B)=C and B(A)=D, then player 2 receives a higher payoff because his program defected while player 1’s program cooperated.

Theorem 1: the game G has a Nash equilibrium which leads to the outcome (C,C).

Proof:

Let A be a program that refers to its own source code (using standard quine techniques) and returns the value C if and only if B=A, that is, the source code of B is equal to the source code of A. Otherwise A returns the value D.

The strategy pair (A,A) leads to the outcome (C,C) because A(A)=C. For any other program B≠A, we have A(B)=D, so the strategy pair (A,B) leads to either the outcome (D,C) or (D,D), both are which are worse for the second player than the outcome (C,C). Therefore program A is a best reply to itself, QED.

## 2 Lobian cooperation

In this section we will continue discussing the Prisoner’s Dilemma variant described in the previous section. We will describe a program that cooperates whenever the opponent’s program is a similar enough implementation of the same algorithm, regardless of irrelevant details like coding style or comments. To achieve that, we will use a proof checker for some formal theory.

A note on language: when talking about programs, statements, or proofs, we implicitly refer to the Godel numbers of those objects. Kleene’s recursion theorem allows us to define self-referential and mutually referential objects, like programs that check proofs about themselves. In this and subsequent sections we will freely use compact jargon, for example, “let A be a program that searches for a proof that A()=1”, which expands to “by Kleene’s recursion theorem, there exists a program A that successively checks for each natural number whether it’s the Godel number of a valid proof in the formal theory T, proving that a program with the same Godel number as A returns the value 1”.

Let G be the game described in section 1. Let T be a formal theory that includes Peano arithmetic. Let F(N) be a computable function from natural numbers to natural numbers, to be defined below (in the proof of Theorem 2.2). For each natural number N, let Coop(N) be the set of all programs A that satisfy the following two conditions:

1) For any program B, A(B)=C implies B(A)=C.

2) The formal theory T can prove the following statement in N symbols or less: “For any program B, if the formal theory T can prove that A(B)=B(A) in F(N) symbols or less, then A(B)=C”.

Theorem 2.1: if the number N is large enough, then the set of programs Coop(N) is non-empty.

Proof:

Let A be a program that enumerates all proofs up to F(N) symbols long in the formal theory T, looking for a proof of the statement “A(B)=B(A)”. If such a proof is found, then program A returns the value C, otherwise D.

If A(B)=C, then A found a proof that A(B)=B(A). Assuming that the formal theory T is sound, we obtain that B(A)=C, so condition 1 holds. The formal theory T can prove the statement in condition 2 by inspection of program A. The length of that proof grows with the number of symbols needed to represent the number N within program A, which is logarithmic in N. Therefore we can choose N to be larger than the length of that proof, thus satisfying condition 2. Then program A is a member of the set Coop(N), QED.

Note that, by an analogous argument, the set Coop(N) also includes different implementations of the same algorithm, as well as other programs that enumerate proofs up to a slightly higher limit than F(N), programs that use theories stronger than the formal theory T, etc.

Theorem 2.2: for any natural number N, if both programs A and B are in the set Coop(N), then the strategy pair (A,B) is a Nash equilibrium of the game G that leads to the outcome (C,C).

Proof:

Since both programs A and B satisfy condition 2 from the definition of Coop(N), the formal theory T can prove the following statement in 2*N symbols or less: “If T can prove that A(B)=B(A)=C in F(N) symbols or less, then A(B)=B(A)=C”. By the bounded Lob’s theorem (see Appendix 1), we can finally define the function F(N)=L(2*N), and obtain that the formal theory T proves that A(B)=B(A)=C. Assuming that the formal theory T is sound, that means the strategy pair (A,B) leads to the outcome (C,C).

To prove that program B is a best reply to program A, let’s assume that some other program B’ is a better reply to A. The only way that can happen is if the strategy pair (A,B’) leads to the outcome (C,D). But program A satisfies condition 1 from the definition of Coop(N), so A(B)=C implies B(A)=C. Contradiction, QED.

(...more to come...)

## Appendix 1

Bounded Lob's theorem: for any formal theory T that includes Peano arithmetic, there exists a computable function L(N) such that, for any statement S in the formal theory T and for any natural number N, if T proves in N symbols or less that the statement "T proves S in L(N) symbols or less" implies S, then T proves S.

Proof:

[-]twanvl100

The length of that proof grows with the number of symbols needed to represent the number N within program A

1. Program A needs F(N), not on N itself. So it either needs to represent F in addition to N, or just F(N). Since F is computable and therefore has a finite program, you can just include the whole thing.

2. It is not at all evident that the length of the proof grows linearly or even polynomially with the length of the program A. This depends on the theory T, it might have a very inefficient way of denoting proofs.

Perhaps it would be easier to switch the roles of N and F(N), so you get

2) The formal theory T can prove the following statement in F(N) symbols or less: “For any program B, if the formal theory T can prove that A(B)=B(A) in N symbols or less, then A(B)=C”.

I am not sure if the corresponding variant of the bounded Löb's theorem holds, though.

the formal theory T can prove the following statement in 2*N symbols or less: "..."

The formal theory will need some additional symbols to tie everything together, so the total proof would need 2*N+K symbols, where hopefully K doesn't depend on N. This is ignoring point 2 above.

Twan, thanks for the detailed comments!

Program A needs F(N), not on N itself. So it either needs to represent F in addition to N, or just F(N). Since F is computable and therefore has a finite program, you can just include the whole thing.

Yes, the intended interpretation is that A represents F too, as do all statements that mention F. That's why in the bounded Lob's theorem the function L is stated to be computable.

It is not at all evident that the length of the proof grows linearly or even polynomially with the length of the program A. This depends on the theory T, it might have a very inefficient way of denoting proofs.

That's true. I guess I'll have to include that as an explicit assumption somehow.

I am not sure if the corresponding variant of the bounded Löb's theorem holds, though.

I think it doesn't hold. Actually it's a nice exercise for everyone to try coming up with alternate formulations of the bounded Lob's theorem, like switching around N and L(N), and see if they can be made to work. It's surprisingly tricky.

The formal theory will need some additional symbols to tie everything together, so the total proof would need 2*N+K symbols, where hopefully K doesn't depend on N.

That's true. Maybe it's easier and more general to just say the length of the total proof is a computable function of N, that way everything seems to work fine.

I'm not remotely a mathematician, but I quite like section 1. Any chance you could link to a description of "standard quine techniques"?

See Quine (computing)) if you're more like a programmer, or Kleene's recursion theorem if you're more like a mathematician.

It's awesome that you're writing this up! I'd like to help in any way I can.

I suggest you add a sentence to make it even clearer at the outset that you're discussing Nash equilibria on the level of "deciding which program to submit" rather than the usual usage of "as a program, should you output C or D"; even though you've written it out correctly, a reader who's not already aware of these ideas could easily misinterpret it and be confused.

This is perhaps the clearest writing on this kind of decision theory I have ever read. I don't know if it would be clear to someone who hasn't read about these things before, but it's clear to me.

Aww, thanks :-)

Cool.

Theorem 1: the game G has a Nash equilibrium which leads to the outcome (C,C).

I assume you know the result that any game has a Nash Equilibrium for any (expected) outcomes in which each player receives at least their minmax values?

Also it might be slightly confusing to reuse the symbol G twice. Looking forwards to the bounded Loeb theorem proof!

I assume you know the result that any game has a Nash Equilibrium for any (expected) outcomes in which each player receives at least their minmax values?

Sorry, why do you mention that? The minmax values in the PD are achieved by the outcome (D,D), which is worse than (C,C).

Also it might be slightly confusing to reuse the symbol G twice.

Thanks, nice catch! Fixed.

Looking forwards to the bounded Loeb theorem proof!

Will try. It took me awhile to even arrive at a formulation that looks correct, maybe it will change again.

[This comment is no longer endorsed by its author]Reply

Sorry, why do you mention that? The minmax value in the PD is achieved by the outcome (D,D), which is worse than (C,C).

Precisely. Since (C,C) is better for both players than the minmax (D,D), there has to be a Nash equilibrium that finds it (and you construction is one example of such).

After reading the thread below, I still don't understand what your original point was. Could you elaborate?

cousin_it's example seemed to be a special case of a more general type of theorem. That theorem (in various forms) is that there exists nash equilibriums for repeated games for every situation where everyone gets more than their absolute guaranteed minimum. The equilibrium goes like "everyone commits to this strategy, and if anyone disobeys, we punish them by acting so as to keep their gains to the strict minimum". Then nobody has any incentive to deviate from that.

Sorry, I must've gone insane for a minute there. Are you saying that (C,C) is a Nash equilibrium of the classical Prisoner's Dilemma?

It is a Nash equilibrium for the infinitely repeated PD (take two tit-for-tat opponents - neither have any incentives to deviate from their strategy).

I believe the same result holds for one-shot games where you have your opponent's code.

It is a Nash equilibrium for the infinitely repeated PD (take two tit-for-tat opponents - neither have any incentives to deviate from their strategy).

I'm not sure that's completely right. Infinitely repeated games need a discount factor to keep utilities finite, and the result doesn't seem to hold if the discount factor is high enough.

I believe the same result holds for one-shot games where you have your opponent's code.

Yeah, that's actually another result of mine called freaky fairness ;-) It relies on quining cooperation described here. Maybe I'll present it in the paper too. LW user benelliott has shown that it's wrong for multiplayer games, but I believe it still holds for 2-player ones.

Pages 151-152 of multiagent systems http://www.masfoundations.org/ has the proper formulation. But they don't seem to mention the need for low discount factors...

Your linked result seems to talk about average utilities in the long run, which corresponds to a discount factor of 1. In the general case it seems to me that discount factors can change the outcome. For example, if the benefit of unilaterally defecting instead of cooperating on the first move outweighs the entire future revenue stream, then cooperating on the first move cannot be part of any Nash equilibrium. I found some results saying indefinite cooperation is sustainable if the discount factor is below a certain threshold.

That sounds reasonable. If v is the expected discounted utility at minmax, w the expected discounted utility according to the cooperative strategy, then whenever the gain to defection is less than w-v, we're fine.

Nice work!

Thanks Daniel. These are the first 2 parts, there will be about 4 or 5 more. Do you think my style of math writing is suitable for publication, or do I need to change it?

I like that it's very concise and readable, and wouldn't want that to get lost--- too many math papers are very dry! I do think, though, that a bit more formality would make mathematical readers more comfortable and confident in your results.

I would move the "note on language" to a preliminaries/definitions section. Though it's nice that the first section is accessible to non-mathy people, don't think that's the way to go for publication. Defining terms like "computer program" and "source code" precisely before using them would look more trustworthy, and using the Kleene construction explicitly in all sections, instead of referring to quine techniques, seems better to me.

I probably wouldn't mention coding style or comments unless your formalism for programs explicitly allows them.

I found it confusing that while A and B are one type of object, C and D are totally different. I think it's also more conventional to use capital letters for sets and lowercase letters for individual objects.

This paper touches on many similar topics, and I think it balances precision and readability very well; maybe you can swipe some notation and structure ideas from there?

Will do.

Thanks for your suggestions! They all seem very reasonable to me and I'll do a rewriting pass.

[-][anonymous]00

1 Quining cooperation

In this section we will describe a variant of the Prisoner’s Dilemma where the players can achieve mutual cooperation whenever their decision processes are symmetric.

Let G be the following game: player 1 writes a computer program A that accepts the source code of another program B as an argument and returns either the value C (cooperate) or D (defect). At the same time, player 2 writes another such computer program B. The payoffs to players are determined by computing the values A(B) and B(A) (that is, passing each program’s source code to the other program) and interpreting these values as moves in a classical Prisoner’s Dilemma. For example, if A(B)=C and B(A)=D, then player 2 receives a higher payoff because his program defected while player 1’s program cooperated.

Theorem 1: the game G has a Nash equilibrium which leads to the outcome (C,C).

Proof:

Let A be a program that refers to its own source code (using standard quine techniques) and returns the value C if and only if B=A, that is, the source code of B is equal to the source code of A. Otherwise A returns the value D.

The strategy pair (A,A) leads to the outcome (C,C) because A(A)=C. For any other program B≠A, we have A(B)=D, so the strategy pair (A,B) leads to either the outcome (D,C) or (D,D), both are which are worse for the second player than the outcome (C,C). Therefore the program A is a best reply to itself, QED.

[This comment is no longer endorsed by its author]Reply