Prisoner's Dilemma (with visible source code) Tournament

After the iterated prisoner's dilemma tournament organized by prase two years ago, there was discussion of running tournaments for several variants, including one in which two players submit programs, each of which are given the source code of the other player's program, and outputs either “cooperate” or “defect”. However, as far as I know, no such tournament has been run until now.

Here's how it's going to work: Each player will submit a file containing a single Scheme lambda-function. The function should take one input. Your program will play exactly one round against each other program submitted (not including itself). In each round, two programs will be run, each given the source code of the other as input, and will be expected to return either of the symbols “C” or “D” (for "cooperate" and "defect", respectively). The programs will receive points based on the following payoff matrix:

“Other” includes any result other than returning “C” or “D”, including failing to terminate, throwing an exception, and even returning the string “Cooperate”. Notice that “Other” results in a worst-of-both-worlds scenario where you get the same payoff as you would have if you cooperated, but the other player gets the same payoff as if you had defected. This is an attempt to ensure that no one ever has incentive for their program to fail to run properly, or to trick another program into doing so.

Your score is the sum of the number of points you earn in each round. The player with the highest score wins the tournament. Edit: There is a 0.5 bitcoin prize being offered for the winner. Thanks, VincentYu!

Details:
All submissions must be emailed to wardenPD@gmail.com by July 5, at noon PDT (Edit: that's 19:00 UTC). Your email should also say how you would like to be identified when I announce the tournament results.
Each program will be allowed to run for 10 seconds. If it has not returned either “C” or “D” by then, it will be stopped, and treated as returning “Other”. For consistency, I will have Scheme collect garbage right before each run.
One submission per person or team. No person may contribute to more than one entry. Edit: This also means no copying from each others' source code. Describing the behavior of your program to others is okay.
I will be running the submissions in Racket. You may be interested in how Racket handles time (especially the (current-milliseconds) function), threads (in particular, “thread”, “kill-thread”, “sleep”, and “thread-dead?”), and possibly randomness.
Don't try to open the file you wrote your program in (or any other file, for that matter). I'll add code to the file before running it, so if you want your program to use a copy of your source code, you will need to use a quine. Edit: No I/O of any sort.
Unless you tell me otherwise, I assume I have permission to publish your code after the contest.
You are encouraged to discuss strategies for achieving mutual cooperation in the comments thread.
I'm hoping to get as many entries as possible. If you know someone who might be interested in this, please tell them.
It's possible that I've said something stupid that I'll have to change or clarify, so you might want to come back to this page again occasionally to look for changes to the rules. Any edits will be bolded, and I'll try not to change anything too drastically, or make any edits late in the contest.

Here is an example of a correct entry, which cooperates with you if and only if you would cooperate with a program that always cooperates (actually, if and only if you would cooperate with one particular program that always cooperates):

(lambda (x)
    (if (eq? ((eval x) '(lambda (y) 'C)) 'C)
        'C
        'D))

232 comments, sorted by
magical algorithm
Highlighting new comments since Today at 3:32 PM
Select new highlight date
Moderation Guidelinesexpand_more

Here is an idea, but I still have some technical problems with implementation: Construct a self-improving intelligent algorithm that will escape from the box, steal the administrator's password, replace the competing algorithms with CooperateBots, then defect against them. Could someone please help me with the Scheme syntax? Or just post the whole program with comments, and I will try to understand how it works. Thanks!!

That would violate the ban on file I/O in the rules.

Only opening files is against the rules. Nothing was specified about deleting and writing new files.

Although I don't see a Scheme example of a self-improving intelligent algorithm on StackExchange, so it must not be supported by your operating system.

I had planned on writing an algorithm that would simply persuade the administrator to declare it victorious without even running the contest.

Or somewhat more realistically:

Is there a way in Scheme for a function to detect whether it is being run by an (eval x) form? Or use a macro to do something like change all the 'Ds to 'Cs in the parent function?

If so, then an amusing case would be a PoltergeistBot, which only ever Defects, but if another bot tries to evaluate it, it "possesses" the function and forces it to Cooperate.

(lambda (x)
  (if (eq? (eval '\'))))))))))) (injectedai ... ))));

The quine requirement seems to me to introduce non-productive complexity. If file reading is disallowed, why not just pass the program its own source code as well as its opponent's?

Yes -- in my version of this you do get passed your own source code as a convenience.

That's a good point. I've already got a few submissions, but on the other hand, I could notify them of the change, and it would only require a trivial modification. Is there a consensus on whether I should do this anyway?

For the record, though I raised an objection, I'd be perfectly happy if the contest were modified so that player programs were passed their sources code as an argument. The rule change would have consequences that I don't understand, and I like that. Caveat emptor — I suspect the rule change would cause people to write more exploitable programs.

Passing in the source code is not the same as quining. A program that is passed in its own source code can easily check to see it's been altered (e.g. by including a cryptographic signature in the source code). With quining, the program can be mutated without easy detection.

Thanks for pointing that out. Unless someone can convince me that this won't be a problem, I will not change the rule.

How does that help? A quine-like program could just as well put its real payload in a string with a cryptographic signature, verify the signature, and then eval the string with the string as input; thus emulating the "passed its own sourcecode" format. You could mess with that if you're smart enough to locate and delete the "verify the signature" step, but then you could do that in the real "passed its own sourcecode" format too.

Conversely, even if the tournament program itself is honest, contestants can lie to their simulations of their opponents about what sourcecode the simulation is of.

A quine-like program could just as well put its real payload in a string with a cryptographic signature, verify the signature, and then eval the string with the string as input; thus emulating the "passed its own sourcecode" format.

Altering the internal structure of an opponent program would be very difficult, but that's not the only way to mutate a program. You can't tinker with the insides of a black box, but you can wrap a black box.

To be concrete: given an opponent's source code, I could mechanically generate an equivalent program with extremely dissimilar source code (perhaps just a block of text, a decryption routine, and a call to eval) that nevertheless acts exactly like the original program in every way. And since that mechanically-obfuscated program would act exactly like the original program in every way, the obfuscated program would not be able to detect that it had been altered. Do you agree?

I'm playing Prisoner's Dilemma and wish to test if an opponent X is honest. I might try the following:

(1) Create two programs, Y and Z, which are algorithmically equivalent but obfuscated versions of X. (2) Run Y and Z against each other.

If Y and Z don't cooperate with each other, that's a good indication that X recognizes itself with a source-code comparison and that I shouldn't trust X.

This honesty check doesn't work if Y and Z are given access to their sources. Sure, when I simulate Y against Z, I could lie to Y and tell Y that its source is X (so Y believes itself to be unmodified). But when my deluded Y simulation is deciding whether to cooperate with Z, it (Y) may run Z in simulation. If Y informs its Z-simulation that Z's source is Z, then that Z-simulation will not be deluded into thinking that it is unmodified. Y's simulation of Z will be able to detect that it is an (obfuscated) simulation and act accordingly.

This honesty check isn't fool proof. X can recognize itself with a more complicated handshake — one that survives code obfuscation. But if X recognizes itself with a more complicated handshake, then X doesn't need to know its own source code (and we shouldn't bother passing the source code in).

I had in mind an automated wrapper generator for the "passed own sourcecode" version of the contest:

(define CliqueBot
 (lambda (self opponent)
  (if (eq? self opponent) 'C 'D)))
(define Wrapper
 (lambda (agent)
  (lambda (self opponent)
   (agent agent opponent))))
(define WrappedCliqueBot
 (Wrapper CliqueBot))

Note that for all values of X and Y, (WrappedCliqueBot X Y) == (CliqueBot CliqueBot Y), and there's no possible code you could add to CliqueBot that would break this identity. Now I just realized that the very fact that WrappedCliqueBot doesn't depend on its "self" argument, provides a way to distinguish it from the unmodified CliqueBot using only blackbox queries, so in that sense it's not quite functionally identical. Otoh, if you consider it unfair to discriminate against agents just because they use old-fashioned quine-type self-reference rather than exploiting the convenience of a "self" argument, then this transformation is fair.

You might want to see if a program would cooperate with an obfuscated version of itself (without the obfuscated version being able to detect that it was obfuscated).

Injecting some keywords: this field of study is known as program equilibrium. Previous LW post on the subject, with links.

Edit: Can you explain how you decided on the parts of the payoff matrix involving "other"? These seem quite important as they affect the viability of strategies based on either convincing your opponent not to halt or not halting yourself.

If HisProgram == MyProgram then
do (C);
else
do (D);
end.

TIL that ethnic hatred and tribalism is a Nash (folk) equilibrium.

Make sure the equality comparison only depends on things that affect functionality -- i.e. it will declare any functionally equivalent programs equal even they use a different nameset for variables or something.

(Yes, I know that's reducible to the halting problem; in practice, you'll use a computable, polynomial time approximation for it that will inevitably have to throw out equivalent programs that are too complex or otherwise be too 'clever'.)

It's quite likely that the optimal behaviour should be different in case the program is functionally equal but not exactly equal.

If you're playing yourself, then you want to cooperate.

If you're playing someone else, then you'd want to cooperate if and only if that someone else is smart enough to check if you'll cooperate; but if it's decision doesn't depend on yours, then you should defect.

You only need to evaluate the equivalence of the first two lines of the program, by the way. It cooperates with those who can't not cooperate with it if it goes through that branch of the logic, and does something else to everyone else.

Can you write that in Scheme so I can submit this? Thanks

The payoffs for "other" were designed so that neither failing to halt, nor convincing the other player not to halt, should ever be a worthwhile strategy. If you don't halt, it gives you the same payoff as if you had cooperated, and gives the other player the same payoff as if you had defected. That way, not halting should be strictly dominated by defecting, since you are better off if you defect, and the other player should react the same way to each threat. And tricking the other player into not halting is also a bad idea, since the payoff you get from it is the same as if they defected.

And tricking the other player into not halting is also a bad idea, since the payoff you get from it is the same as if they defected.

(Defect, non-halt) is actually better than (Defect, Defect) for you, since it gives you a relative advantage over competitors in the tournament.

True, but I still don't expect it will be a big problem. If there are a lot of submissions, the effect will be small, and if it is paying enough attention to your source code for it to be possible to trick it into not halting, then it is probably looking for a way to achieve mutual cooperation, so tricking it is still not a good strategy.

If you trust all sufficiently smart players to try to induce (Defect, non-halt) if (Defect, Defect) is otherwise inevitable, the effect adds up over a hopefully significant portion of the submissions.

The payoffs for "other" were designed so that neither failing to halt, nor convincing the other player not to halt, should ever be a worthwhile strategy. If you don't halt, it gives you the same payoff as if you had cooperated, and gives the other player the same payoff as if you had defected.

Your game world implements an "enthusiastic consent" policy

I'll post this here, because it's somewhat related. I've asked a few people that two box in Newcomb's problem what they would do in a variation of the problem. Instead of deciding to one box or two box, you are asked to write a program that will one box or two box, and Omega is given access to the source code before filling the boxes. When phrased like this, the few two boxers I asked said they would write the program to one box.

I'll post this here, because it's somewhat related. I've asked a few people that two box in Newcomb's problem what they would do in a variation of the problem. Instead of deciding to one box or two box, you are asked to write a program that will one box or two box, and Omega is given access to the source code before filling the boxes. When phrased like this, the few two boxers I asked said they would write the program to one box.

Note that assuming that the people in question are emulating CDT agents this is precisely what we would expect (and CDT is the most plausible hypothesis for the kind of decision algorithm they are emulating). Similarly, if a (sane) CDT agent exists at time T with the capability to (relatively cheaply) self modify it will immediately alter itself into "CDT++". Roughly speaking it will end up as an agent that operates similarly to TDT for the purpose of influencing decisions that occur after time T but similarly to CDT for the purpose of influencing decisions that occur before time T. CDT is not in a stable equilibrium according to its own decision algorithm!

Similarly, if a (sane) CDT agent exists at time T with the capability to (relatively cheaply) self modify it will immediately alter itself into "CDT++".

I still don't think this is a good description of the change; the difference between CDT, CDT++, and TDT are on the level of causal models of reality, not how to make decisions once presented with a causal model of reality.

No person may contribute to more than one entry.

This is important, because otherwise the contest devolves into who can submit the most copies of AbsolutismBot (cooperate with programs that share it's source code, otherwise defect)

I think that any submitted program can be improved by combining it with AbsolutismBot. If you're playing with someone who submitted the same program for you, cooperate (they can't defect against you in this scenario, since they're running identical source code). If they aren't running the same code, run whatever program lies underneath it.

I think this could get generalized to cooperation with everyone who has the AbsolutismBot "wrapper", since it doesn't matter what the code after the AbsolutismBot section does. In English (since I don't know how to program in Scheme), the program would be like this:

If the first 117 characters of the other program are the same as the first 117 characters of this program, cooperate. Otherwise, do some other strategy.

All players that implement this strategy will cooperate with each other. Granted, this doesn't help them win the tournament since it isn't a relative advantage compared to other AbsolutismBots - it just makes everyone who doesn't do this lose the tournament.

Except that over some threshold, any Anti-Absolutism bots (which have some way of "escaping" while still containing the same first 117 characters, like having a C preprocessor directive that redefines TRUE to equal FALSE) would necessarily be superior.

Interesting. Really, what you want (in slightly more generality) is to cooperate with anyone who can prove they will cooperate if you yourself can prove you will cooperate under the same condition.

That is, if from their source, you can prove "they cooperate if they can prove this condition about me", then you cooperate.

Of course, it's not generally possible to prove things about a program given its source, especially at this level of self-reference. In this particular case the "generally" in there is important. It is in your interest for other programs to be able to prove this property about you. This is beyond the scope of the tournament, obviously, but given extremely sophisticated players, every player benefits from adding this property (if possible). You might even want to include a subprogram which produces a proof!

Let me try to clear that up.

Define the "Reasonable" property reflexively: a program is "Reasonable" if it provably cooperates with any program it can prove is Reasonable.

It is in the interest of every program to be Reasonable*. In fact, it is in the interest of every program both to be Reasonable and to exhibit a proof of its own Reasonableness. (You might even define that into the Reasonable property: don't just require provable (conditional) cooperation, but require the exhibition of a proof of conditional cooperation.)

Potentially you might also expand the definition to require efficient proofs - say, at most a thousand instructions.

On the other hand, if you're playing with aliens, there's not necessarily going to be a way you can clearly establish which part of your source is the proof of your Reasonableness. So you want it to be as easy as possible for someone else to prove that you are Reasonable.

I'll happily expand / reword this if it's at all unclear.

*Oh - this is maybe untrue. If you are really good at getting other programs to cooperate and then defecting, you are better served by not being reasonable.

Define the "Reasonable" property reflexively: a program is "Reasonable" if it provably cooperates with any program it can prove is Reasonable.

I'm not sure your definition defines a unique "reasonable" subset of programs. There are many different cliques of mutually cooperating programs. For example, you could say the "reasonable" subset consists of one program that cooperates only with exact copies of itself, and that would be consistent with your definition, unless I'm missing something.

Point. Not sure how to fix that.

Maybe defined the Reasonable' set of programs to be the maximal Reasonable set? That is, a set is Reasonable if it has the property as described, then take the maximal such set to be the Reasonable' set (I'm pretty sure this is guaranteed to exist by Zorn's Lemma, but it's been a while...)

Zorn's lemma doesn't give you uniqueness either. Also, maximal under which partial order? If you mean maximal under inclusion, then my one-element set seems to be already maximal :-)

Clarify what you mean by the various conjugations of proof: Do you mean "There exists a valid proof in PA with a Godel number less than N"?

Just "there exists a valid proof in PA". If you're playing with bounded time, then you want efficient proofs; in that case the definition would be as you have it.

At that point you can't implement it in a halting Turing machine. I'm not sure what PA can prove about the behavior of something that isn't a halting Turing machine regarding a particular input.

By "implement it", you mean, one can't verify something is Reasonable on a halting TM? Not in general, of course. You can for certain machines, though, particularly if they come with their own proofs.

Note that the definition is that Reasonable programs cooperate with those they can prove are Reasonable, not programs which are Reasonable.

So then a program which can present a flawed proof which is not necessarily recognizable as flawed to machines of equivalent scale, can dominate over those other machines?

Also, if we want this contest to be a model of strategies in the real world, shouldn't there be a fractional point-cost for program size, to model the fact that computation is expensive?

I.e., simpler programs should win over more complex programs, all else being equal.

Perhaps the most accurate model would include a small payoff penalty per codon included in your program, and a larger payoff penalty per line of codon actually executed.

EDIT: What's wrong with this post?

(I didn't downvote you.)

It's quite straightforward to write an algorithm which accepts only valid proofs (but might also reject some proofs which are valid, though in first-order logic you can do away with this caveat). Flawed proofs are not an issue - if A presents a proof which B is unable to verify, B ignores it.

A proves that A is inconsistent, then proves that A cooperates with every program that A proves is Reasonable and that B is reasonable.

B accepts A's proof that A is inconsistent, and the rest follow trivially.

I'm not sure I understand. A is a TM - which aspect is it proving inconsistent?

A proves that the logic A uses to prove that B is Reasonable is inconsistent. It is sufficient to say "If I can prove that B is Reasonable, B is Reasonable".

There's someone who consistently downvotes everything I ever write whenever he comes onto the site; I'm not sure what to do about that.

Except we're not allowed to use anyone else's source code, so the test could just as easily be simplified to "if opponent source contains integer 1415926535, cooperate"(for a randomly chosen 1415926535).

It's not necessarily best to cooperate with everyone with the "AbsolutismBot" wrapper. This guarantees that you and the other program will both cooperate, but without the wrapper it might have been the case that you would defect and the other program would cooperate, which is better for you.

I would like to declare the following: I have submitted a program that cooperates only if you would cooperate against CooperateBot. You can of course create a selective defector against it, but that would be a bit tricky, as I am not revealing the source code. Evaluate your submissions accordingly.

What wubbles actually submitted: A program that defects only if you would cooperate against CooperateBot.

Well played.

But surely, good sir, common sense says that you should defect against CooperateBot in order to punish it for cooperating with DefectBot.

Also, in modal combat your bot is X=[]Y(CooperateBot) and is easily detected via the test [1](Y(X)<->[]X(CooperateBot)) where [1] denotes provability in T+Con(T), ie [1](Q) = []((~[]F)->Q).

But surely, good sir, common sense says that you should defect against CooperateBot in order to punish it for cooperating with DefectBot.

I thought you said people should tolerate tolerance :)

Am I missing something, or does your detector use quantification, which is disallowed in Patrick's modal agents?

Hm. I think X within the test could be introduced as a new constant and solved, but I'm not sure.

The agent defined by wubbles is actually the agent called JustBot in the robust cooperation paper and which is proven to be non-exploitable by modal agents.

JusticeBot cooperates with anyone who cooperates with FairBot, and is exploitable by any agent which comprehends source code well enough to cooperate with FairBot and defect against JusticeBot. Though I'm going here off the remembered workshop rather than rechecking the paper.

You're right, and wubbles's agent can easily be exploited by a modal agent A defined by A(X)=C <-> [] (X(PB)=C) (where PB is PrudentBot).

If you can tell where in the stack you are (like you could with C), you could tell if you were being run by the main program, or by another contestant. Can you?

Unless the other contestant wrote a virtual machine in which they are running you. Something which I think would be quite doable considering the ridiculously large time you've got (10s gives ~10^10 instructions).

When their VM runs your VM running their VM... it times out and everybody loses.

Unless one of the contestants have time limits on their VM (or on their simulations in general). You can of clearly implement a VM where time goes faster just by pretending they have a slower processor than you really run on.

Hrm- is it different if they run my function from within theirs instead of constructing a full VM? I was considering ways to communicate with a copy of myself that my simulation of my opponent was running that it was a simulation, but couldn't find any good ideas.

If they run your function from within theirs they simply tell the computer to start reading those instructions, possibly with a timer for stopping detailed in other parts of the comments. If they implement a VM from scratch they can mess with how the library functions work, for instance giving you a time that moves much faster so that your simulation must stop within 0.1s instead of 10 and they can run your code 100 different times to deal with randomness. Now implementing your own VM is probably not the optimal way to do this, you probably just want to do a transformation of the source code to use your own secret functions instead of the standard time ones.

I was considering simply measuring the behavior of my current opponent against programs that aren't me and determining their behavior as cooperatebot, mutualbot, defectbot, cooperate if they simulate me, cooperate against long programs, cooperate IFF they cooperate IFF I cooperate, or some other beast. That allows their behavior to depend on my behavior versus them, but my behavior only depends on their behavior versus third parties.

I can't see a way to check another program for the desired IFF behavior without going beyond my skill level, but I think a mutualbot with a tiny chance of cooperating followed by a mutualbot with a tiny chance of defecting comes close; the first puts a lot of copies on the stack and then the top one cooperates unilaterally; if the response of the opponent is mutuality, it will be cooperate all the way down. If my opponent defects at his top level because I didn't cooperate for the right reason, it yields a defection... all the way down. A perfect program wouldn't do that, because it could see that it was probably in a simulation.

One way is to recursively call a bunch of functions and catch the resulting "recursion limit exceeded" exception.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I will send 0.5 BTC (currently worth ~50 USD) to a bitcoin address of the
winner's choice. If a tie occurs, the 0.5 BTC will be split evenly. If there
are disputes (including but not limited to issues about cheating and
identifying the winner), I will let AlexMennen dictate the appropriate
distribution for the 0.5 BTC.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iQIcBAEBAgAGBQJRtI6bAAoJEKqVvI+6cLPpykUP/0KuIqskiZvMVLMpm7diWC5q
sOf3+3XTvECQmeHNbku7dM+ihkwf1Gg8BTKt0HEjfj3tUIG4sWeA2UvFcGReNKoH
F5RsI2zPbn368k5EqS3lLyjcInGqZYH2X+zbOKxkezo7IJk7tvk+JHpR+a2U598e
tF1daaZYIOxkcmSpVYRAW6y0dX4iWSZxtbDYr+U4muYWp88W60tlNI9Fc5SAhCev
yEAhIUnX4f7QxsL/9oTmNoLa/ECSfGkrCoD5yc6d/hqfp8pV7hknFc11hAMnYGbw
Qj3hp4q7sEbetT8NAn3X+UUZ9Vld76lF73mEo5ZGVfa2dG713/yjpmt8fR/arhcn
8RWMpJ7qpFVvU3wBeKuLgASev+D+CpuImWUbFeuGqF3sD7n6m0R77zx535pt0I6M
/GtuSEtTCp3xQAcOHXDC/pPZ0Z1Fl5Z3e1JxcDfCntOodv8Rgtma3oml1DuD0HsY
7DeqqMc4WRRiU8K4ohX16PZ52rhs2TogGYx3dlvP6xt33gfQMOYf7AM0vi5s4nBL
FeBrqiRRjdrlI4RJqXeihcu3GMwXbsa0wxEwit0CHLvPqGX0FoF2/S8x13uBmzCm
gCRTEa4hIuDtCBFJozKB3dMl66c97df+ntFQoXoeZDADgf8yVOQ9lFCkC4usGsqD
5BjPUxtJBQPMzP61cKMD
=kHiy
-----END PGP SIGNATURE-----

(Note: Add a blank line after each of the "Hash:" and "Version:" lines before verifying the signature. LW's markdown seems to be hungry and is eating blank lines in code blocks.)

Someone pretending to be you could copy this whole message, PGP signature and all, as a reply to a completely different competition, and unless you can later find this message and prove that you haven't edited it, it'll look like you're offering a prize to that competition as well.

That's a good point that I hadn't considered. Thanks! I'll keep in mind that I need to be more specific in the message.

Thinking about it, a spurious copy of my message can't be used with future competitions because the PGP signature includes a timestamp (and it is publicly implausible that I would have my system time set incorrectly to the past). But it can be used maliciously with competitions that already exist at the time the signature was created.

But it'll only be believable if AlexMennen is involved with that contest as well.

If you happen to AlexMennen, it's the perfect crime!

Just thought I'd throw this out there:

TabooBot: Return D if opponent's source code contains a D; C otherwise.

To avoid mutual defection with other bots, it must (like with real prudish societies!) indirectly reference the output D. But then other kinds of bots can avoid explicit reference to D, requiring a more advanced TabooBot to have other checks, like defecting if the opponent's source code calls a modifier on a string literal.

like defecting if the opponent's source code calls a modifier on a string literal

Actually, 'D it's not a string literal -- it's a symbol. Compilers can legally to turn the symbol 'D into something opaque like 0x3859 and destroy any reference to the letter "D". The only way a program can generate a 'D symbol on its own is to include it in its source.

But that doesn't mean that a program without a 'D can't defect! An IncrediblyInnocentBot, which does not contain a 'D and can't generate a 'D on its own can still defect by stealing a 'D from the opponent agent.

One way to steal a 'D from an opponent would be to search for quoted symbols in the opponent's program. The opponent could foil this method, however, by including decoy symbols.

Alternatively, IncrediblyInnocentBot could simulate its opponent against a bunch of stupid agents, such as CooperateBot or DivergeBot, and hope that the opponent defects in at least one of these simulations. If the opponent ever returns a symbol other than 'C in simulation, IncrediblyInnocentBot remembers that symbol and can later use it for nefarious purposes. IncrediblyInnocentBot is in-credible indeed.

BTW, the strategies I listed above are why I said it was not trivial for an agent to prove that MimicBot does not defect against a cooperator, despite the fact that MimicBot does not contain the defect symbol.