The Prisoner's Dilemma has been discussed to death here on OB/LW, right? Well, here's a couple new twists to somewhat... uh... expand the discussion.

Warning: programming and math ahead.


Scenario 1

Imagine a PD tournament between programs that can read each other's source code. In every match, player A receives the source code of player B as an argument, and vice versa. Matches are one-shot, not iterated.

In this situation it's possible to write a program that's much better than "always defect". Yes, in an ordinary programming language like C or Python, no futuristic superintelligent oracles required. No, Rice's theorem doesn't cause any problems.

Here's an outline of the program:

 // begin PREFIX
Strategy main(SourceCode other)
    // get source code of this program from "begin PREFIX" to "end PREFIX",
    // using ordinary quine (self-printing program) techniques
    String PREFIX = "..."

    if (other.beginsWith(PREFIX))
        return Strategy.COOPERATE;
        return anythingElse(other);
// end PREFIX

// from this point you can write anything you wish
Strategy anythingElse(SourceCode other)
    return Strategy.DEFECT;

Some features of this program:

  • It cooperates with itself.
  • It cooperates with any other program that begins with PREFIX.
  • It's impossible to cheat, because opponents that begin with PREFIX can't not cooperate with this program.
  • Other authors now have an incentive to include PREFIX in their programs, moving their original logic into the "anythingElse" subroutine. This modification has no downside.

So, introducing such a program into the tournament should lead to a chain reaction until everyone cooperates. Unless I've missed something. What say ye?

Edit: the last point and the conclusion were wrong. Thanks to Warrigal for pointing this out.


Scenario 2

Now imagine another tournament where programs can't read each other's source code, but are instead given access to a perfect simulator. So programs now look like this:

Strategy main(ObjectCode self, ObjectCode other, Simulator simulator) {...}

and can call simulator.simulate(ObjectCode a, ObjectCode b) arbitrarily many times with any arguments. To give players a chance to avoid bottomless recursion, we also make available a random number generator.

Problem: in this setting, is it possible to write a program that's better than "always defect"?

The most general form of a reasonable program I can imagine at the moment is a centipede:

  1. Programmer invents a number N and a sequence of real numbers 0 < p1 < p2 < ... < pN < 1.
  2. Program generates a random number 0 < r < 1.
  3. If r < p1, cooperate.
  4. Simulate the opponent's reaction to you.
  5. If opponent defects, defect.
  6. Otherwise if r < p2, cooperate.
  7. And so on until N.

Exercise 1: when (for what N and pi) does this program cooperate against itself? (To cooperate, the recursive tree of simulations must terminate with probability one.)

Exercise 2: when does this program win against a simple randomizing opponent?

Exercise 3: what's the connection between the first two exercises, and does it imply any general theorem?



Ordinary humans playing the PD othen rely on assumptions about their opponent. They may consider certain invariant properties of their opponent, like altruism, or run mental simulations. Such wetware processes are inherently hard to model, but even a half-hearted attempt brings out startling and rigorous formalizations instead of our usual vague intuitions about game theory.

Is this direction of inquiry fruitful?

What do you think?

New Comment
63 comments, sorted by Click to highlight new comments since:

'Other authors now have an incentive to include PREFIX in their programs, moving their original logic into the "anythingElse" subroutine. This modification has no downside.'

Not really. Including this prefix makes it impossible to defect against a PREFIX program. It's still better to exclude the PREFIX and get the other problem to cooperate while you defect, if possible. It is true, though, that this isn't worthwhile if many programs will never cooperate if you defect, which includes that where anythingElse == Strategy.DEFECT;.

Good catch, thanks. You're right. Upvoted the comment, edited the post.

A weaker statement seems to be true: that the program provided is an evolutionarily stable strategy.

Scenario 2 seems to share some similarities with Rolf Nelson's AI Deterrence Problem. You might want to check it out if you haven't already.

This is in my top 5 favorite LW posts so far.

Let's introduce a time limit. Say that after a maximum of S computations (i.e., computation steps using some standardized notion) have passed, each player is forced to make a decision.

Now, write a program that is opaque to introspection: to find out what it decides (i.e. to COOPERATE or DEFECT) , it must be simulated until it halts. This program could use cryptography or other obsfuscation systems (random numbers would be useful). Engineer this program so that it take exactly S steps to run to completion.

The simulating player does not have time to both simulate and interpret the results of its simulation.

Seemingly, restricting all machines to the same time limit serves to reduce the efficacy of many (all?) of these adversarial simulation strategies.

The simulating player does not have time to both simulate and interpret the results of its simulation.

...and so defects, because it's obvious what the other player intends.

...and so defects, because it's obvious what the other player intends.

More interestingly, what if the program being simulated has a really clever algorithm that just happens to take S steps to compute?

A program can't be "clever" if it is indistinguishable from a permanent defector to other programs.

In the second scenario we can, losing a little generality, prohibit obfuscation by accepting only programs already organized as decision graphs (possible nodes: throw dice and branch; simulate opponent and branch; cooperate; defect). The problem stays meaningful because programs have no access to each other's source and must resort to simulation. Maybe the problem thus formulated has a trivial solution, but I don't see it yet.

Slightly tangential question about source code swapping: If A's source code depends on what it reads from B, and B on what it reads from A... Is there any chance of a Halting problem?

No. A's action depends on the soucecode of B and vice versa. As the sourcecode does not depend upon the sourcecode, nor the action on the action, you aren't into recursion

Let's say I write a program A that compiles and runs B with A's source code as input. If B's output is "cooperate", A cooperates. If B's output is "defect", A defects.

Now I pit A against itself. Oops, infinite recursion. This isn't exactly the same as the halting problem, but it rhymes.

Are you talking about scenario 1 or 2? In scenario 1, A doesn't run B, it only checks that the beginning of B's source code matches a certain string (which happens to coincide with the beginning of A's source code). In scenario 2, note my note about the random number generator.

I'm talking about scenario 1. I am not talking about your version of A and B. I assumed A and B generally referred to the two programs in the competition. Anyway, I gave an example of a program that would behave optimally. Unfortunately, it will go into infinite recursion against any program using a similar tactic.

Basically, If you know a program's source code and you know what input it will receive, you can perfectly predict the output.

I don't want to be rude, but I can't understand why anyone is arguing otherwise. Scenario 1 is very similar to the halting problem.

I wouldn't call a program that fails to cooperate with itself "optimal". My program is more optimal than yours, because it cooperates with itself in finite time. :-)

But this is an interesting direction of inquiry too. Is there a non-contradictory way to add something like magical predictor oracles to scenario 1? Would the resulting problem be mathematically interesting? Eliezer seems to think yes, unless I've misunderstood his position... but he refuses to disclose the math.


In general, in the scenarios like 2, you should give info other than black-box access to running. There should be a way to abstractly model the other player. Following something like Solomonoff induction is the last resort.

I find this direction of inquiry fascinating. Access to source code is not all that interesting, but simulations (especially when you introduce uncertainty (both error and intentional deception)) are a closer analog to human decisions.

Very interesting indeed. Suppose we replaced "4. simulate the opponent's reaction to you" and instead had

  1. Simulate the opponents reaction many times and learn the probability p with which they will defect.
  2. If p < k then cooperate, else defect (for some constant k)

Game theory with machine learning, what do you think?

Maybe, but take care to terminate. If your program always starts out by simulating the opponent twice, it won't even cooperate against a copy of itself, going into an infinite binary tree of recursion instead.

In the scenarios like 2, you should give info other than black-box access to running. There should be a way to abstractly model the other player. Following something like Solomonoff induction is the last resort.

What do you mean by "abstractly model"?

Systematic construction of formal systems that are in a certain correspondence with the system under study, such that studying the properties of these formal systems may allow proving the properties of the system under study. This describes any nontrivial formal analysis method, including the construction of Bayesian networks.

From programming languages/computer science, see e.g. abstract interpretation, bisimulation, basically things that hang on Gaois connection.

Returning to your example, you may want to give if not the whole source code, then at least a good model of the source code, or some kind of abstraction that is able to answer the requests about the formal properties of the code. Having to actually run the fixed source code in black-box mode is much too harsh.

Thanks for the link to Galois connection. But why did you pick scenario 2? Can you give an example how such techniques would be useful in scenario 1?

You are making the situation worse for scenario 2, at least you give the source code for scenario 1. But yes, in general you'd want to not just give source code, but also a way of proving from that source code that it has certain properties (otherwise you are faced with halting problem/Rice theorem issues, and even more practical problems that halt you far short of those). It's easy for the explicit situation you describe in scenario 1, but worse in other cases.

In some cases, you don't really need an intermediary of explicit source code, you may just directly communicate a relatively small formal model allowing to prove required properties. This is useful for formally proving properties of systems for which formal models can't be constructed automatically, but can more or less reliably be constructed manually. See, for example, model checking (although the Wikipedia article is too short to reflect this aspect).

In general, you are interested in some aspect of runtime semantics, not of source code per se; as was noted by one commenter, what the program does also depends on how it's compiled and interpreted, what hardware it runs on, and what that hardware does as a result. But you are also not interested in most of that runtime semantics, only the aspects you care about. The code in a programming language usually (at least nominally) comes with formal semantics, that allows you to construct an overdetailed formal model, from which you can then try to abstract out the details that you don't care about, to find the answers to your questions, like "does this program ever crash at runtime?"

I admit I'm in a fog. Intuitively, abstractly interpreting an opponent that's also an abstract interpreter should only give you information about its unconditional behavior, not its behavior against a concrete you. Do you have a simple specific example of what you're describing - a mathematical game where both players can analyze certain properties of each other?

I admit I'm in a fog. Intuitively, abstractly interpreting an opponent that's also an abstract interpreter should only give you information about its unconditional behavior, not its behavior against a concrete you.

You've given an example yourself, in scenario 1. What happens can be modeled by analyzing programs for the presence of property X, that is a minimal predicate such that it holds for a given program foo if for each arg that has property X, foo(arg) returns "Cooperate". Then, you can use your knowledge that f1 and f2 both have property X to prove that f1(f2) returns "Cooperate". X doesn't need to be as simple or explicit as the fixed prefix, you can in general assign an arbitrary algorithm for computing such X (with some amount of trouble resulting from that). Did you mean an example more detailed than this, more formal than this, or does this answer succeed in connecting the terminology?

Did you mean an example more detailed than this, more formal than this, or does this answer succeed in connecting the terminology?

None of the above! In scenario 1, PREFIX isn't part of the problem statement; it is part of the solution. The problem statement said only that programs can read each other's source code. Unless I'm misunderstanding your initial comment in this thread, you want to invent a Scenario 3 with rules distinct from 1 or 2: programs are given each other's source code and/or Something Else. Well, what is that Something Else? What are the mathematical rules of the game for your proposed Scenario 3? Or are you proposing some new player algorithms for Scenario 1 or Scenario 2 instead of making up a new game? If so, what are they? Please answer simply and concretely.

In these terms, I'm describing a solution to scenario 1, and its limitations, that is it can't solve all scenarios 1 (naturally, at least because of the halting problem). The Something Else are the limitations, for example your solution to scenatio 1 requires Something Else in a form of a fixed prefix. This solution is not terribly general, as it won't accept even equivalent programs that include an additional comment in the body of the prefix.

Analyzing a formal model of the program gives more generality, and stronger methods of analysis give more generality still. If you are the program for which you introduce these properties, you'd also want to pick a property that tells that you achieve your goals, or as high in your preference order as you can get.

Ah - you want to generalize the PREFIX-test, expand the equivalence class. Sorry for being obtuse in my replies - I was giving you the benefit of the doubt, because generality is so obviously, totally useless in Scenario 1. Each program author wanting to utilize the PREFIX method of cooperation has quite enough incentive to coordinate on a common value of PREFIX with other such authors. If you prohibit vocal coordination beforehand, they will use the Schelling focal point: the lexicographically smallest PREFIX possible in a given programming language.

You may consider the scenario 3, where the trusted algorithm checks that your PREFIX is present in your opponent, and only tells you that fact, without giving you its source code or a simulator black box. But anyway, this little example is not a Totally General Solution to All Similar Problems, and so generalization is not useless. It might in a given formal problem, that is already solved by other means, but not in other cases.

For example, recall my previous remarks about how PD turns into an Ultimatum game once you allow mixed strategies (that is, any form of randomness in the programs). In which case, pure 'Cooperate' becomes suboptimal.

Then just change PREFIX to play the Pareto-optimal symmetric mixed strategy instead of cooperating.

And if you do that properly, taking into account other issues like reflective consistency, you won't need the other part of your algorithm at all, since picking fair Pareto-optimal strategies will also allow to correctly deal with defectors.

I'm still perplexed by the fairness part (and Ultimatum game in particular) though. One solution from symmetry seems obvious (not 'cut on equal parts', since what 'equal part' means also depends on the preference orders of each agent), but I don't know in what sense it's universally right, and if it isn't, what should be done.


This is not Prisoner's Dilemma. In order for the PREFIX strategy to be workable, a trusted third party has to guarantee that the source code passed to each agent is actually what the other agent will execute. A real Prisoner's Dilemma has no such trusted third party. If there is no trusted third party to guarantee the source code, then seeing the "source code" becomes useless.

This is not Prisoner's Dilemma. In order for the PREFIX strategy to be workable, a trusted third party has to guarantee that the source code passed to each agent is actually what the other agent will execute. A real Prisoner's Dilemma has no such trusted third party. If there is no trusted third party to guarantee the source code, then seeing the "source code" becomes useless.

Of course this is not PD. This is a different game model that happens to have an interesting and non-obvious solution.

Inventing such models is worthwhile because one of them might someday turn out to be a closer match for the behavior of real-life agents than the classical PD. Real-life agents do have some mutual information, so the natural first step is to construct a model with complete mutual information. The next steps... are up to you, dear readers. :-)

You did not phrase your article as if it was obvious to you that this is not PD. Throughout your article, you referred to this as a "PD tournament" and "playing the PD".

So you're trying to model real-life agents in PD-like situations. Real-life actors generally don't face PD, but face a variant PD+I, where I is infrastructure that tries to induce agents to cooperate. Such infrastructure generally works through incentives, disincentives, probabilities of discovery, etc. The justice system, traffic police, even religion are examples of such PD+I infrastructures.

Each of these infrastructures works differently; in the case of religion, radically differently. I guess religion might be the closest real-life example to your source code inspection proposal - people do appear to cooperate more with others who share their religious beliefs. Other infrastructures, such as the justice system and traffic police, do not appear to be captured by your model.

Yes, that's completely right. The justice system might be a less interesting example of I than religion - it just tweaks the payoffs in the cells. It would be really interesting to bring scenario 1 closer to real-life religion by somehow making information incomplete, but as of now I have no idea how to do it formally.

This is very interesting, I was thinking on exactly these same lines the other day. Couldn't see how to get past the infinite-loop simulation problem. I suggest a couple of modifications to the analysis framework:

Assume the 1st program has access to its own source code as input, in addition to the other source code. This gets past possible confusion over quining, and is more symmetric with the simulator case.

Also, what happens in the simulator case if the other program runs forever, when given your code? I realize you are trying to avoid that by using your random values and running the other program a limited number of times, but it seems like it will be hard to prove in general that a program runs in bounded time. You might consider adding a parameter (or feature) to the simulator that limits how long it can run.

I have an unproven hypothesis: if simulate(A, A) terminates and simulate(B, B) terminates, then simulate(A, B) terminates. This would give us a natural rule: each participant must make sure that it terminates against itself. Of course I might be wrong...

I don't think this holds. Its clearly possible to construct code like:

if(other_src == my_sourcecode) { return Strategy.COOPERATE; }
if(simulate(other_src, my_sourcecode) == Strategy.COOPERATE)
  return Strategy.COOPERATE;
  return Strategy.DEFECT;

B is similar, with slightly different logic in the second part (even a comment difference would suffice).

simulate(A,A) and simulate(B,B) clearly terminate, but simulate(A,B) still calls simulate(B,A) which calls simulate(A,B) ...

No, reread the post - in the second scenario programs can't read or compare each other's source code. You're given two ObjectCode instances that are totally opaque except you can pass them to the simulator. If you still succeed in constructing a counterexample for my hypothesis, do let me know.

Ah sorry, I'd thought this was in relation to the source available situation. I think this may still be wrong however. Consider the pair of programs below:

    return Strategy.Defect.

    if(random(0, 1.0) <0.5) {return Strategy.Cooperate; }

        if(simulate(other, self) == Strategy.Cooperate) { return Strategy.Cooperate; }

simulate(A,A) terminates immediately. simulate(B,B) eventually terminates. simulate(B,A) will not terminate 50% of the time.

As a functional programmer, I can't help but notice that invisible global state like random destroys the referential transparency of these programs and that they cease to be functions. If the random number generator's seed were passed in as an argument, restoring purity, would termination be restored?

(Offhand, I don't think it would. Even if program A used each digit to decide whether to defect or cooperate, program B could just follow the same strategy but choose the reverse and simulate one more step.)

Yes, you're right. Thanks!

Unless I've missed something. What say ye?

Verifying source code is not quite enough. You need to verify that the compiler/interpreter or hardware that the sourcecode is run on treat the prefix semantically identically.

E.g. From a machine code perspective if 04 is the opcode for ifeq in one agents architecture and ifneq in anothers, they can transfer source code and still not be sure what the other will do.

Well, technically yes, but I'd say that falls under "nitpick". In any actual tournament the execution environment would be published common knowledge, as in the shootout for example.

True, we had different uses in mind. I was trying to forestall any people using the idea as an argument that future AI swapping source code should necessarily cooperate.

By the way, I liked the math at the end, although I don't have time to sit down and check my intuitions.

Then the question is whether AIs can (a) trustworthily verify each other's source code by e.g. sending in probes to do random inspections which include verifying that the inspected software is not deliberately opaque nor is it set to accept coded overrides from outside, or (b) don't need to verify each other's source code because the vast majority of initial conditions converge on the same obvious answer to PD problems.

(a) Random inspections probably won't work. It's easy to have code/hardware that look innocent as individual parts, but together have the effect of being a backdoor. You won't detect the backdoor unless you can see the entire system as a whole.

Tim Freeman's "proof by construction" method is the only viable solution to the "prove your source code" problem that I've seen so far.

(b) is interesting, and seems to be a new idea. Have you written it up in more detail somewhere? If AIs stop verifying each other's source code, won't they want to modify their source code to play Defect again?

Look innocent to a cursory human inspection, yes. But if hardware is designed to be deterministically cooperative/coordinating and to provably not be a backdoor in combination with larger hardware, that sounds like something that should be provable if the hardware was designed with that provability in mind.

Many governments, including the US, are concerned right now that their computers have hardware backdoors, so the current lack of research results on this topic is not just due to lack of interest, but probably intrinsic difficulty. Even if provable hardware is physically possible and technically feasible in the future, there is likely a cost attached, for example running slower than non-provable hardware or using more resources.

Instead of confidently predicting that AIs will Cooperate in one-shot PD, wouldn't it be more reasonable to say that this is a possibility, which may or may not occur, depending on the feasibility and economics of various future technologies?

The singleton scenario seems overwhelmingly likely, so whatever multiple AIs will exist, they'll play by the singleton's rules, with native physics becoming irrelevant. (I know, I know...)

I believe this stuff bottoms out in physics - it's either possible or impossible to make a physically provable analog to the PREFIX program. The idea is fascinating, but I don't know enough physics to determine whether it's crazy.

The difficulty would be to make sure nothing could interact with the atoms/physical constituents of the prefix in a way that distorts the prefix. Prefixes of programs have the benefit they go first, and in the serial nature of most programs, things that go first have complete control.

So it is a question of isolating the prefix. I'm going to read this paper on isolation and physics, before making any comments on the subject.

I read the paper, and it seemed to me to be useless. We want a physically inviolable guarantee of isolation.

It gave some ideas. It suggests we might start with specifying time limits, e.g. specifying a system will be effectively isolated for a certain time, by scanning a region of space around that system.


Though my knowledge of physics is near nonexistent, I imagine future AIs wouldn't inspect each other's source code; they would instead agree to set up a similar tournament and use some sort of inviolable physical prohibitions in place of tournament rules. This sounds like an exciting idea for someone's future post: physical models of PD with light, gravity etc., and mechanisms to enforce cooperation in those models (if there are any).


Good point whpearson. This is a very good analogy to human behavior. Humans are able to run cooperative 'source code' that is relatively easy for us to identify. Yet what their underlying hardware will actually return with this information is somewhat different.

Even if I know another man's source code I still need a solid understanding of his hardware and compiler before I can make a sound judgement.

You would have to examine the source code outside of the PREFIX section. Many programming languages have preprocessing directives like #define which would allow you to do something like this:


Where the compiler would replace instances of "COOPERATE" with "DEFECT". I can think of few other ways to game this without touching the code between the PREFIX tags.

Programs with defines above prefix, don't begin with prefix... check the pseudo code.

This is beside the point, even if it were true: you are not helping in constructing the least convenient possible world.


It's impossible to cheat, because opponents that begin with PREFIX can't not cooperate with this program.

// begin PREFIX
Strategy main(SourceCode other)
return Strategy.DEFECT;
// end PREFIX


No, your program doesn't begin with PREFIX. PREFIX is the chunk of source code between "begin PREFIX" and "end PREFIX" in the original program. Note the mention of quines.