Iterated Gambles and Expected Utility Theory

by Sable 3 min read25th May 201644 comments


The Setup

I'm about a third of the way through Stanovich's Decision Making and Rationality in the Modern World.  Basically, I've gotten through some of the more basic axioms of decision theory (Dominance, Transitivity, etc).


As I went through the material, I noted that there were a lot of these:

Decision 5. Which of the following options do you prefer (choose one)?

A. A sure gain of $240

B. 25% chance to gain $1,000 and 75% chance to gain nothing


The text goes on to show how most people tend to make irrational choices when confronted with decisions like this; most strikingly was how often irrelevant contexts and framing effected people's decisions.


But I understand the decision theory bit; my question is a little more complicated.


When I was choosing these options myself, I did what I've been taught by the rationalist community to do in situations where I am given nice, concrete numbers: I shut up and I multiplied, and at each decision choose the option with the highest expected utility.


Granted, I equated dollars to utility, which Stanovich does mention that humans don't do well (see Prospect Theory).



The Problem

In the above decision, option B clearly has the higher expected utility, so I chose it.  But there was still a nagging doubt in my mind, some part of me that thought, if I was really given this option, in real life, I'd choose A.


So I asked myself: why would I choose A?  Is this an emotion that isn't well-calibrated?  Am I being risk-averse for gains but risk-taking for losses?


What exactly is going on?


And then I remembered the Prisoner's Dilemma.



A Tangent That Led Me to an Idea

Now, I'll assume that anyone reading this has a basic understanding of the concept, so I'll get straight to the point.


In classical decision theory, the choice to defect (rat the other guy out) is strictly superior to the choice to cooperate (keep your mouth shut).  No matter what your partner in crime does, you get a better deal if you defect.


Now, I haven't studied the higher branches of decision theory yet (I have a feeling that Eliezer, for example, would find a way to cooperate and make his partner in crime cooperate as well; after all, rationalists should win.)


Where I've seen the Prisoner's Dilemma resolved is, oddly enough, in Dawkin's The Selfish Gene, which is where I was first introduced to the idea of an Iterated Prisoner's Dilemma.


The interesting idea here is that, if you know you'll be in the Prisoner's Dilemma with the same person multiple times, certain kinds of strategies become available that weren't possible in a single instance of the Dilemma.  Partners in crime can be punished for defecting by future defections on your own behalf.


The key idea here is that I might have a different response to the gamble if I knew I could take it again.


The Math

Let's put on our probability hats and actually crunch the numbers:

Format -  Probability: $Amount of Money | Probability: $Amount of Money

Assuming one picks A over and over again, or B over and over again.

Iteration A--------------------------------------------------------------------------------------------B

1 $240-----------------------------------------------------------------------------------------1/4: $1,000 | 3/4: $0

2 $480----------------------------------------------------------------------1/16: $2,000 | 6/16: $1,000 | 9/16: $0

3 $720---------------------------------------------------1/64: $3,000 | 9/64: $2,000 | 27/64: $1,000 | 27/64: $0

4 $960------------------------1/256: $4,000 | 12/256: $3,000 | 54/256: $2,000 | 108/256: $1,000 | 81/256: $0

5 $1,200----1/1024: $5,000 | 15/1024: $4,000 | 90/256: $3,000 | 270/1024: $2,000 | 405/1024: $1,000 | 243/1024: $0

And so on. (If I've ma de a mistake, please let me know.)


The Analysis

It is certainly true that, in terms of expected money, option B outperforms option A no matter how many times one takes the gamble, but instead, let's think in terms of anticipated experience - what we actually expect to happen should we take each bet.


The first time we take option B, we note that there is a 75% chance that we walk away disappointed.  That is, if one person chooses option A, and four people choose option B, on average three out of those four people will underperform the person who chose option A.  And it probably won't come as much consolation to the three losers that the winner won significantly bigger than the person who chose A.


And since nothing unusual ever happens, we should think that, on average, having taken option B, we'd wind up underperforming option A.


Now let's look at further iterations.  In the second iteration, we're more likely than not to have nothing having taken option B twice than we are to have anything.


In the third iteration, there's about a 57.8% chance that we'll have outperformed the person who chose option A the whole time, and a 42.2% chance that we'll have nothing.


In the fourth iteration, there's a 73.8% chance that we'll have matched or done worse than the person who has chose option A four times (I'm rounding a bit, $1,000 isn't that much better than $960).


In the fifth iteration, the above percentage drops to 63.3%.


Now, without doing a longer analysis, I can tell that option B will eventually win.  That was obvious from the beginning.


But there's still a better than even chance you'll wind up with less, picking option B, than by picking option A.  At least for the first five times you take the gamble.




If we act to maximize expected utility, we should choose option B, at least so long as I hold that dollars=utility.  And yet it seems that one would have to take option B a fair number of times before it becomes likely that any given person, taking the iterated gamble, will outperform a different person repeatedly taking option A.


In other words, of the 1025 people taking the iterated gamble:

we expect 1 to walk away with $1,200 (from taking option A five times),

we expect 376 to walk away with more than $1,200, casting smug glances at the scaredy-cat who took option A the whole time,

and we expect 648 to walk away muttering to themselves about how the whole thing was rigged, casting dirty glances at the other 377 people.


After all the calculations, I still think that, if this gamble was really offered to me, I'd take option A, unless I knew for a fact that I could retake the gamble quite a few times.  How do I interpret this in terms of expected utility?


Am I not really treating dollars as equal to utility, and discounting the marginal utility of the additional thousands of dollars that the 376 win?


What mistakes am I making?


Also, a quick trip to google confirms my intuition that there is plenty of work on iterated decisions; does anyone know a good primer on them?


I'd like to leave you with this:


If you were actually offered this gamble in real life, which option would you take?