You know that automated proof verifiers exist, right? And also that programs can know their own source code? Well, here's a puzzle for you:

*Consider a program A that knows its own source code. The algorithm of A is as follows: generate and check all possible proofs up to some huge size (3^^^^3). If A finds a proof that A returns 1, it returns 1. If the search runs to the end and fails, it returns 0. What will this program actually return?*

Wait, that was the easy version. Here's a harder puzzle:

*Consider programs A and B that both know their own, and each other's, source code. The algorithm of A is as follows: generate and check all proofs up to size 3^^^^3. If A finds a proof that A returns the same value as B, it returns 1. If the search fails, it returns 0. The algorithm of B is similar, but possibly with a different proof system and different length limit. What will A and B return?*

This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?

I may have a tentative proof that the answer to the first problem is 1, and that in the second problem they will cooperate. But: a) it requires you to understand some logic (the diagonal lemma and Löb's Theorem), b) I'm not sure it's correct because I've only studied the subject for the last four days, c) this margin is too small to contain it. So I put it up here. I submit this post with the hope that, even though the proof is probably wrong or incomplete, the ideas may still be useful to the community, and someone else will discover the correct answers.

**Edit:** by request from Vladimir Nesov, I reposted the proofs to our wiki under my user page. Many thanks to all those who took the time to parse and check them.

I notice a curious pattern in this post: technically illiterate critical comments get voted up to +1/+2 (even after getting debunked), while technically accurate arguments (replies) stay at 0/+1. Please vote up only things you know you understand, and do vote up comments that debunk incorrect arguments.

ETA: Mostly corrected now.Cousin_it, do you have an argument that n1 ... n6 < nmax? It seems like one way for the proof to fail is if no matter what nmax is, step 11 forces n1 > nmax.

Actually, someone shot down my proposed my.C = (my.C == your.C) algorithm, because if two prisoners both defect, that is in a mathematical sense "cooperating if and only if you cooperate". So if the other prisoner is Defection Rock, then my.C = (my.C == your.C) has no consistent solution.

In the first version, there are different ways to go about programming algorithm A, and some will output 1, and some will output 0. In other words, "if A finds a proof that A returns 1" is ambiguous, because it depends on the exact value of A, not the generic description given.

In the second version, both programs output 1, since this is the only possible consistent result. If either program might output a 0, then both programs would have to, which implies that both should have output 1.

timtyler: I think you're confused about how proof verifiers work. A proof verifier takes a proof as input, and verifies that the given proof is correct. It doesn't have to generate the proof itself, and it gets to assume that each step is some small, easily-checkable application from a finite set of axioms.

Put another way, if the proof you give a verifier makes a large leap in reasoning, then the verifier will just tell you that the proof is wrong. Moreover, by the formalization that we demand of those proofs, the proof

iswrong.So, the problem that proof verifiers solve is computable. So, we need only to assume that the proof verifier that these programs have is correctly implemented, we don't really need to know their internal details.

(If you already understand this, though, then I'm confused about your confusion. Could happen.)

[edit:] Actually, I think I might see the point you're making; we have to assume the theory the prover understands is rich enough to model stepwise program semantics. This probably should have been specified, but we can just assume it.

Edit: The following is not obviously possible, see this comment.This can be generalized to provide even better algorithms for more general games. Let the following be a simplified outline of the algorithm for cooperation problem, with bounds omitted (players are A and B, our player is A):

The proof that the outcome is "Cooperate" basically hinges on the provability of Löb statement

... (read more)Does the obvious Lobian proof for 1 still go through if there's a bound on the proof length built into all the sentences?

The generalized action idea, applied to this setting, works as follows. Agents A and B can surrender control over their output to procedures they don't fully understand, but expect to produce good results. That they expect good result is reflected, as in previous construction, in the order in which they check for the possibility:

H... (read more)

This is awesome - great post!

I've scanned briefly through the comments, and didn't see anyone ask this, but apologies if I missed it:

You use the diagonal lemma to establish that there is a piece of code called "box" such that the following is provable:

eval(box) == implies( proves(box, n1), eval(outputs(1)))

But couldn't there likewise be a "box2" such that:

eval(box2) == implies( proves(box2, n1), eval(outputs(2)))

And then couldn't you patch up the proof someh... (read more)

Edit: Wow, I really am an idiot. I assumed the second statement was true about every statement, but I just realized (by reading

oneextra comment after posting) that by Lob's Theorem that's not true. But my original idea, that the first statement is all that's required to prove anything, still seems to hold.Okay, I can follow the first proof when I assume statement 1, but I don't quite understand how cousin_it got to 1. The Diagonal Lemma requires a free variable inside the formula, but I can't seem to find it.

In fact, I think I totally misunderstand the D... (read more)

I think it's possible to coordinate without the huge computational expense, if the programs would directly provide their proofs to each other. Then each of them would only need to check a proof, not find it.

The input to the programs would be a pair (opponent's source, opponent's proof), and the output would be the decision: cooperate or defect.

The algorithm for A would be: Check the B's proof, which must prove that B would cooperate if the proof it gets in its input checks out. If the proof checks out - cooperate, otherwise - defect.

The most obvious implementation for the easy version will return 0 (or, if the length limit is lifted, fail to return at all).

It's important, however, to be clear on just what constitutes a valid proof. Consider the following implementation:

Not valid because it didn't find a proof, you say? On the contrary, assuming we are dealing with idealized mathematical computers that are perfectly deterministic and have no outside inputs, the fact that a program returns a given value on this run does indeed constitute a proof that it will do so on... (read more)

Without the very long limit you've added, the easy problem becomes formally undecidable as it leads to an infinite regress. In order to return 1, the program has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and..... You'll use up your 3^^...^^3 iterations and return 0.

It's a bit like this problem. http://en.wikipedia.org/wiki/Halting_problem

I believe Program A in "the easy version" would return 0. Assuming zero run-time errors, its structure implements the logical structure:

nsteps, then A returns 1.nsteps, then A returns 0.However

nis defined (the post proposesn= 3^^^^3), it can be shown by the definition of the word "proof" that:nsteps, then A returns 1.and therefore the first proposition holds for every program, and cannot be used to... (read more)

And interestingly, posters here only started criticizing this game-theoretic reasoning when I used it. c=/

Why?