In praise of heuristics

byBucky3mo24th Oct 201827 comments


We’ll get there in the end, bear with me.

Introduction to ZD strategies in IPD

Feel free to skip if you’re already familiar with ZD strategies.

In the iterated prisoner’s dilemma (IPD) a zero determinant (ZD) strategy is one which forces your opponent’s winnings to be a linear relation to your own winnings. These strategies can take either generous or extortionate forms.

Think of it as a modified version of tit-for-tat.


A Generous ZD strategy still always respond to C with C but will sometimes also respond to D with a C (it sometimes fails to punish). With "standard" PD utilities (T=0, R=-1, P=-2, S=-3) my opponent gains 1 utility by defecting. If I defect back in retaliation, I cost him 2 units of utility. If I defect back with probability 0.7, on average I cost him 1.4 units of utility. This still means that defecting is disadvantageous for my opponent (loss of 0.4 utility) but not quite as disadvantageous as it would be if I was playing pure tit-for-tat (loss of 1 utility).

This gets slightly more complex when you don't have constant gaps between T, R, P and S but the principle remains the same.

If he defects at all, my opponent will end up gaining more utility than me, but less than he would have got if he had co-operated throughout.

Advantages of GZD are:

1. Total utility isn’t damaged as much by accidental defections as it is in pure tit-for-tat.

2. It won’t get caught in endless C-D, D-C, C-D, D-C as tit-for-tat can.


On the other hand, Extortionate ZD always responds to D with D but also sometimes responds to C with D. Provided I don’t respond to C with D too often, it is still advantageous to my opponent to play C (in terms of their total utility).

If my opponent co-operates at all I'll end up with more utility than him. If he gives in and plays C all the time (to maximise his own utility) I can achieve a better utility than I would with C-C. .

The main disadvantage of EZD in evolutionary games is that it defects against itself.

For both EZD and GZD you can vary your probabilities to be more or less generous/extortionate, provided you always ensure your opponent gets the most utility by co-operating.

Different opinions on fairness

An extortionate ZD strategy is similar to an opponent who has different perceptions of what is fair. Maybe your opponent had to pay to play the game but you got in free so he wants a higher percentage of the winnings. Maybe you think this is just his bad luck and think a 50:50 split is fair.

If you give in to what seems to you to be an extortionate strategy, your opponent is encouraged to make more extortionate demands in future, or modify his definition of what is fair. At some point, the level of extortion is so high that you barely get any advantage from co-operating.

This brings us to a proposal of Eliezer’s.

When choosing whether to give in to an extortioner you can capitulate to some extent, provided that you ensure your opponent gains less utility than he would if he agreed to your favoured position (ideally you should let your opponent know that this is what you're doing).

This removes any motivation to your opponent to extort and encourages him to give his true estimation of what is fair.

Two experiments in ZD strategies

Hilbe et al. performed an experiment on humans playing against computerised ZD strategies. Four different strategies were tried – strong extortion through to strong generosity. Regrettably, pure tit-for-tat wasn’t included as I would have liked to see a comparison with this.

The two generous strategies achieved higher average utility for the ZD programme than the 2 extortionate strategies. If the human players had acted purely in self-interest (they were paid according to the points gained) the extortionate strategists would have won. So what happened?

Firstly, a bit of detail about the experimental setup. The participants were not told that they were playing against a computer programme – the impression given was that they were playing against one of the other experimental subjects (although this wasn’t explicitly stated).

Looking at the results from each individual player it is clear that none of the human participants allowed the extortionate ZD strategists to beat the score that is achievable from co-operating (R=0.3).

It seems that the subjects automatically used a strategy similar to the one suggest by Eliezer (or this represented something of a limit to co-operating) when dealing with a player who seemed to be extortionate.

In another experiment some players did allow the extortionate ZD strategy to achieve higher utility than it would have got by co-operating but over the two experiments there is a strong tendency not to let the unfair strategy get away with it.


The second experiment tested the effects of:

1. More rounds of IPD (500 vs 60)

2. Being told that your opponent is a computer (Aware (A), Unaware (U))

3. Extortionate (E) / Generous (G) ZD strategies

Interestingly, human players in this second experiment were, over a long game, much more willing to let their EZD opponent “get away with it” when they were told that their opponent was a computer (see the grouping of red dots in figure a below). For the final 60 rounds of a 500 round IPD the extortionate ZD strategist was achieving 3.127 average utility – significantly (p=0.021) more than the R=3 gained from both players co-operating.


So why are we more willing to let a computer “get away with it”?

Maybe we view a computer as being insusceptible to change so are therefore more likely to give in.

Alternatively, if you think you are playing against a human, even after 500 rounds you will probably be annoyed enough with him for not co-operating properly that you won’t be co-operating the whole time. You’re less likely to get annoyed at a computer for beating you at a game. As soon as you realise you can’t beat the computer you can just try to do the best you can for yourself. This doesn't dent your pride as much as it would against a human opponent.

(There was one person who just defected pretty much throughout the whole 500AE experiment despite knowing he was playing against a computer, maybe he just decided to tit-for-tat or maybe it's just a lizardman constant. Without him the 500AE ZD strategy would have had even more impressive results.)


To me this looks like humans having a heuristic to deal with extortionate opponents/people who have a different opinion on what a fair split is. People seem to apply this heuristic naturally through emotions such as pride, annoyance and anger.

The heuristic works out roughly similarly to Eliezer’s suggestion of regulating your opponent’s winnings to less than he would achieve if he played fair/by your rules.

Being told that your opponent is a computer effectively turns off this heuristic (if you have long enough to get a rough idea of the computer’s strategy). This motivates your opponent to become more extortionate, something which the original heuristic was protecting you against.

In praise of heuristics

All of that is a very long introduction/example of my main point.

Heuristics are good.

Heuristics are very good.

You don’t even know how many times your heuristics have saved you.

You possibly have no idea what they are saving you from.

Knowing about biases can hurt people. Getting rid of heuristics without understanding them properly is potentially even more dangerous.


A recent discussion made me aware of this post by Scott where he tried to come up with a way of dealing with people who claim you have caused them offense.

One of the motivations for the post was Everybody Draw Mohammed Day. EDMD seems like it is a natural outworking of the heuristic described above.

People see the terrorists increasing their utility unfairly by attacking people who draw pictures of Mohammed. To ensure they don’t get an advantage by defecting, people want to decrease their utility back to below where they started – hence Everybody Draw Mohammed Day.

The terrorists were also following a similar heuristic – the original cartoon decreased their utility, they are trying to decrease the utility of those who created it to demotivate further defection.

The heuristic isn’t there to improve the world – it is just there so that the person performing it doesn’t encourage increased defection against themselves and increased demands from others.

Scott’s post was an attempt to turn off the heuristic and replace it with a principled position:

The offender, for eir part, should stop offending as soon as ey realizes that the amount of pain eir actions cause is greater than the amount of annoyance it would take to avoid the offending action, even if ey can't understand why it would cause any pain at all. If ey wishes, ey may choose to apologize even though no apology was demanded

In this case, his proposal was criticised by others and Scott ended up rejecting his own proposal.

Had Scott applied his policy universally he would likely have ended up losing out if those he dealt with had modified to become more demanding of him.

It’s likely that our heuristic doesn’t lead us to an optimal result, it just prevents some bad results which Scott’s proposal may have led to.

Possibly a principled application of Eliezer’s proposal would help optimise the result better than the heuristic. In the experiments there was no standard amount that people chose to penalise the EZD strategist - the results were fairly spread out over the region between full defection and Eliezer's defined maximum co-operation. Sometimes the heuristic doesn't stop there and co-operates more than Eliezer would suggest.


This all sounds a bit harsh on Scott. Actually, putting an idea out there, engaging with criticism and admitting when you were wrong is exactly the right thing to do.

I’ll give an example where I didn’t do this and it did, in fact, end up biting me in the butt.


A while back, I was thinking about status. Status is, within a fixed group, a zero-sum game. People in the workplace are constantly attempting to improve their position on the ladder at the expense of others. This doesn’t just apply to promotions, it applies to pretty much everything. Alex wants to feel like he’s important and will get massively offended if he feels that Bob is trying to take status which should be Alex’s. This probably accounts for ~95% of disagreements in my workplace.

Zero-sum games are, usually, for suckers. If you can get out of the game and into a positive sum game, you probably should. This is doubly true if you’re competing for a thing you’re not really interested in.

Status very much matches my definition of a zero-sum game which I don’t want to play. The problem is, status also allows access to things which I do want – it is a very useful instrumental value. It is a game which everyone else plays so it’s hard to unilaterally leave.

Instead, I made the decision not to play status games unless I really have a need of the status (e.g. I will attempt to achieve status in the eyes of the person who will decide on a potential promotion but not others). Essentially I was trying to turn off the heuristic of “always attempt to gain status with everyone” and replace it with a trimmed down version “attempt to gain status only with those people who make a decision about your pay/promotions etc.”

Now if you have any experience in how status games work, you may realise that this was a naïve approach. If it isn’t obvious to you, have a think about what might go wrong.




If you don’t fight for your status with your colleagues, it’s like blood in the water. If they can push themselves up at your expense they will, not always maliciously, it’s just “the thing to do” in a workplace. If there are no consequences then it will happen again and again. In the end, this will mean that the people who you care about impressing will see the status that others treat you as having and start to modify their own opinion of your status.

It took me a while to realise just how harmful this was. When I did, I had to do a lot of firefighting to re-establish a sense of normality.

All is fine again now but the experience did teach me that my heuristics are there for a reason and that I shouldn’t get rid of them entirely without properly understanding the consequences.


I haven’t decided exactly how I should deal with tackling heuristics in future but I have a few initial thoughts.

1. Don’t be overconfident that you have really understood why the heuristic is there

2. When comparing potential pros and cons, remember the cons are likely to be worse than you think

3. Discuss ideas with others

4. Where possible, make small changes first and monitor progress