I suspect that the True Prisoner's Dilemma played itself out in the Portugese and Spanish conquest of Mesoamerica. Some natives were said to ask, "Do they eat gold?" They couldn't comprehend why someone would want a shiny decorative material so badly, they'd kill for it. The Spanish were Shiny Decorative Material maximizers.
That's a really insightful comment!
But I should correct you, that you are only talking about the Spanish conquest, not the Portuguese, since 1) Mesoamerica was not conquered by the Portuguese; 2) Portuguese possessions in America (AKA Brazil) had very little gold and silver, which was only discovered much later, when it was already in Portuguese domain.
I agree: Defect!
Clearly the paperclip maximizer should just let us have all of substance S; but a paperclip maximizer doesn't do what it should, it just maximizes paperclips.
I sometimes feel that nitpicking is the only contribution I'm competent to make around here, so... here you endorsed Steven's formulation of what "should" means; a formulation which doesn't allow you to apply the word to paperclip maximizers.
Very nice representation of the problem. I can't help but think there is another level that would make this even more clear, though this is good by itself.
Eliezer,
The other assumption made about Prisoner's Dilemma, that I do not see you allude to, is that the payoffs account for not only a financial reward, time spent in prison, etc., but every other possible motivating factor in the decision making process. A person's utility related to the decision of whether to cooperate or defect will be a function of not only years spent in prison or lives saved but ALSO guilt/empathy. Presenting the numbers within the cells as actual quantities doesn't present the whole picture.
I agree: Defect!
I didn't say I would defect.
By the way, this was an extremely clever move: instead of announcing your departure from CDT in the post, you waited for the right prompt in the comments and dropped it as a shocking twist. Well crafted!
It's likely deliberate that prisoners were selected in the visualization to imply a relative lack of unselfish motivations.
An excellent way to pose the problem.
Obviously, if you know that the other party cares nothing about your outcome, then you know that they're more likely to defect.
And if you know that the other party knows that you care nothing about their outcome, then it's even more likely that they'll defect.
Since the way you posed the problem precludes an iteration of this dilemma, it follows that we must defect.
How might we and the paperclip-maximizer credibly bind ourselves to cooperation? Seems like it would be difficult dealing with such an alien mind.
I think Eliezer's "We have never interacted with the paperclip maximizer before, and will never interact with it again" was intended to preclude credible binding.
The entries in a payoff matrix are supposed to sum up everything you care about, including whatever you care about the outcomes for the other player. Most every game theory text and lecture I know gets this right, but even when we say the right thing to students over and over, they mostly still hear it the wrong way you initially heard it. This is just part of the facts of life of teaching game theory.
Robin, the point I'm complaining about is precisely that the standard illustration of the Prisoner's Dilemma, taught to beginning students of game theory, fails to convey those entries in the payoff matrix - as if the entries were merely money instead of utilons, which is not at all what the Prisoner's Dilemma is about.
The point of the True Prisoner's Dilemma is that it gives you a payoff matrix that is very nearly the standard matrix in utilons, not just years in prison or dollars in an encounter.
I.e., you can tell people all day long that the entries are in utilons, but until you give them a visualization where those really are the utilons, it's around as effective as telling juries to ignore hindsight bias.
Eliezer, I agree that your example makes more clear the point you are trying to make clear, but in an intro to game theory course I'd still start with the standard prisoner's dilemma example first, and only get to your example if I had time to make the finer point clearer. For intro classes for typical students the first priority is to be understood at all in any way, and that requires examples as simple clear and vivid as possible.
I don't think Eliezer misunderstood. I think you are missing his point, that economists are defining away empathy in the way they present the problem, including the utilities presented.
In the universe I live in, there are both cooperators and defectors, but cooperators seem to predominate in random encounters. (If you leave yourself open to encounters in which others can choose to interact with you, defectors may find you an easy mark.)
In order to decide how to act with the paperclip maximizer, I have to figure out what kind of universe it is likely to inhabit. It's possible that a random super intelligence from a random universe will have few opportunities to cooperate, but I think it's more likely that there are far more SIs and univ...
Prase, Chris, I don't understand. Eliezer's example is set up in such a way that, regardless of what the paperclip maximizer does, defecting gains one billion lives and loses two paperclips.
Basically, we're being asked to choose between a billion lives and two paperclips (paperclips in another universe, no less, so we can't even put them to good use).
The only argument for cooperating would be if we had reason to believe that the paperclip maximizer will somehow do whatever we do. But I can't imagine how that could be true. Being a paperclip maximizer, it's bound to defect, unless it had reason to believe that we would somehow do whatever it does. I can't imagine how that could be true either.
Or am I missing something?
Definitely defect. Cooperation only makes sense in the iterated version of the PD. This isn't the iterated case, and there's no prior communication, hence no chance to negotiate for mutual cooperation (though even if there was, meaningful negotiation may well be impossible depending on specific details of the situation). Superrationality be damned, humanity's choice doesn't have any causal influence on the paperclip maximizer's choice. Defection is the right move.
It's clear that in the "true" prisoner it is better to defect. The frustrating thing about the other prisoner's dilemma is that some people use it to imply that it is better to defect in real life. The problem is that the prisoner's dilemma is a drastic oversimplification of reality. To make it more realistic you'd have to make it iterated amongst a person's social network, add a memory and a perception of the other player's actions, change the payoff matrix depending on the relationship between the players etc etc.
This versions shows cases in which defection has a higher expected value for both players, but it's more contrived and unlikely to come into existence than the other prisoner's dilemma.
Michael: This is not a prisoner's dilemma. The nash equilibrium (C,C) is not dominated by a pareto optimal point in this game.
I don't believe this is correct. Isn't the Nash equilibrium here (D,D)? That's the point at which neither player can gain by unilaterally changing strategy.
michael webster,
You seem to have inverted the notation; not Eli.
(D,D) is the Nash equilibrium, not (C,C); and (D,D) is indeed Pareto dominated by (C,C), so this does seem to be a standard Prisoners' Dilemma.
To the extent one can induce one to empathize, cooperating is optimal. The repeated game does this by having them play again and again, and thus be able to realize gains from trade. You assert there's something hard wired. I suppose there are experiments that could distinguish between the two models, ie, rational self interest in repeated games, versus the intrinsic empathy function.
I would certainly hope you would defect, Eliezer. Can I really trust you with the future of the human race?
I would certainly hope you would defect, Eliezer. Can I really trust you with the future of the human race?
Ha, I was waiting for someone to accuse me of antisocial behavior for hinting that I might cooperate in the Prisoner's Dilemma.
But wait for tomorrow's post before you accuse me of disloyalty to humanity.
Ha, I was waiting for someone to accuse me of antisocial behavior for hinting that I might cooperate in the Prisoner's Dilemma.
It is fascinating looking at the conversation on this subject back in 2008, back before TDT and UDT had become part of the culture. The objections (and even the mistakes) all feel so fresh!
At this point Yudkowsky sub 2008 has already (awfully) written his TDT manuscript (in 2004) and is silently reasoning from within that theory, which the margins of his post are too small to contain.
Hrm... not sure what the obvious answer is here. Two humans, well, the argument for non defecting (when the scores represent utilities) basically involves some notion of similarity. ie, you can say something to the effect of "that person there is similar to me sufficiently that whatever reasoning I use, there is at least some reasonable chance they are going to use the same type of reasoning. That is, a chance greater than, well, chance. So even though I don't know exactly what they're going to choose, I can expect some sort of correlation between the...
I like this illustration, as it addresses TWO common misunderstandings. Recognizing that the payoff is in incomparable utilities is good. Even better is reinforcing that there can never be further iterations. None of the standard visualizations prevent people from extending to multiple interactions.
And it makes it clear that (D,D) is the only rational (i.e. WINNING) outcome.
Fortunately, most of our dilemmas repeated ones, in which (C,C) is possible.
I want to defect, but so does the clip-maximizer. Since we both know that, and assuming that it is of equal intelligence than me, which will make it see through any of my attempt of an offer that would enable me to defect, I would try to find a way to give us the incentives to cooperate. That is - I don't believe we will be able to reach solution (D,C), so let's try for the next best thing, which is (C,C).
How about placing a bomb on two piles of substance S and giving the remote for the human pile to the clipmaximizer and the remote for its pile to the hum...
I apologize if this is covered by basic decision theory, but if we additionally assume:
the choice in our universe is made by a perfectly rational optimization process instead of a human
the paperclip maximizer is also a perfect rationalist, albeit with a very different utility function
each optimization process can verify the rationality of the other
then won't each side choose to cooperate, after correctly concluding that it will defect iff the other does?
Each side's choice necessarily reveals the other's; they're the outputs of equivalent computations.
Interesting. There's a paradox involving a game in which players successively take a single coin from a large pile of coins. At any time a player may choose instead to take two coins, at which point the game ends and all further coins are lost. You can prove by induction that if both players are perfectly selfish, they will take two coins on their first move, no matter how large the pile is. People find this paradox impossible to swallow because they model perfect selfishness on the most selfish person they can imagine, not on a mathematically perfect selfishness machine. It's nice to have an "intuition pump" that illustrates what genuine selfishness looks like.
Cooperate (unless paperclip decides that Earth is dominated by traditional game theorists...)
The standard argument looks like this (let's forget about the Nash equilibrium endpoint for a moment): (1) Arbiter: let's (C,C)! (2) Player1: I'd rather (D,C). (3) Player2: I'd rather (D,D). (4) Arbiter: sold!
The error is that this incremental process reacts on different hypothetical outcomes, not on actual outcomes. This line of reasoning leads to the outcome (D,D), and yet it progresses as if (C,C) and (D,C) were real options of the final outcome. It's similar to...
It is well known that answers to questions on morality sometimes depend on how the questions are framed.
I think Eliezer's biggest contribution is the idea that the classical presentation of Prisoner's Dilemma may be an intuition pump.
I'm hoping we'd all defect on this one. Defecting isn't always a bad thing anyways; many parts of our society depend on defected prisoner's dilemmas (such as competition between firms).
When I first studied game theory and prisoner's dilemmas (on my own, not in a classroom) I had no problem imagining the payoffs in completely subjective "utils". I never thought of a paperclip maximizer, though.
I know this is quite a bit off-topic, but in response to:
We're born with a sense of fairness, honor, empathy, sympathy, and even altruism - the result of ou...
In laboratory experiments of PD, the experimenter has the absolute power to decree the available choices and their "outcomes". (I use the scare quotes in reference to the fact that these outcomes are not to be measured in money or time in jail, but in "utilons" that already include the value to each party of the other's "outcome" -- a concept I think problematic but not what I want to talk about here. The outcomes are also imaginary, although (un)reality TV shows have scope to create such games with real and substantial payof...
simpleton: won't each side choose to cooperate, after correctly concluding that it will defect iff the other does?
Only if they believe that their decision somehow causes the other to make the same decision.
CarlJ: How about placing a bomb on two piles of substance S and giving the remote for the human pile to the clipmaximizer and the remote for its pile to the humans?
It's kind of standard in philosophy that you aren't allowed solutions like this. The reason is that Eliezer can restate his example to disallow this and force you to confront the real dilemma....
Allan: No, it's preferable to choose (D,C) if we assume that the other player bets on cooperation.
Which will happen only if the other player assumes that the first player bets on cooperation, which with your policy is incorrect. You can't bet on unstable model.
decide self.C; if other.D, decide self.D We're assuming, I think, that you don't get to know what the other guy does until after you've both committed (otherwise it's not the proper Prisoner's Dilemma). So you can't use if-then reasoning.
I can use reasoning, but not actual reaction on the facts, whic...
Alan: They don't have to believe they have such casual powers over each other. Simply that they are in certain ways similar to each other.
ie, A simply has to believe of B "The process in B is sufficiently similar to me that it's going to end up producing the same results that I am. I am not causing this, but simply that both computations are going to compute the same thing here."
[D,C] will happen only if the other player assumes that the first player bets on cooperation
No, it won't happen in any case. If the paperclip maximizer assumes I'll cooperate, it'll defect. If it assumes I'll defect, it'll defect.
I debug my model of decision-making policies [...] by requiring the outcome to be stable even if I assume that we both know which policy is used by another player
I don't see that "stability" is relevant here: this is a one-off interaction.
Anyway, lets say you cooperate. What exactly is preventing the paperclip maximizer from defecting?
Psy-Kosh: They don't have to believe they have such causal powers over each other. Simply that they are in certain ways similar to each other.
I agree that this is definitely related to Newcomb's Problem.
Simpleton: I earlier dismissed your idea, but you might be on to something. My apologies. If they were genuinely perfectly rational, or both irrational in precisely the same way, and could verify that fact in each other...
Then they might be able to know that they will both do the same thing. Hmm.
Anyway, my 3 comments are up. Nothing more from me for a while.
Despite the disguise, I think this is the same as the standard PD. In there (assuming full utilities, etc...), the obvious ideal for an impartial observer is to pick (C,C) as the best option, and for the prisoner to pick (D,C).
Here, (D,C) is "righter" than (C,C), but that's simply because we are no longer impartial obervers; humans shouldn't remain impartial when billions of lives are at stake. We are all in the role of "prisoners" in this situation, even as observers.
An "impartial observer" would simply be one that valued one...
A.Crossman: Prase, Chris, I don't understand. Eliezer's example is set up in such a way that, regardless of what the paperclip maximizer does, defecting gains one billion lives and loses two paperclips. This is standard defense of defecting in a prisonner's dilemma, but if it were valid then the dilemma wouldn't be really a dilemma.
If you can assume that the maximizer uses the same decision algorithm as we do, we can also assume that it will come to the same conclusion. Given this, it is better to cooperate, since it will gain billion lives (and a paperclip). But we don't know whether the paperclipper uses the same algorithm.
I heard a funny story once (online somewhere, but this was years ago and I can't find it now). Anyway I think it was the psychology department at Stanford. They were having an open house, and they had set up a PD game with M&M's as the reward. People could sit at either end of a table with a cardboard screen before them, and choose 'D' or 'C', and then have the outcome revealed and get their candy.
So this mother and daughter show up, and the grad student explained the game. Mom says to the daughter "Okay, just push 'C', and I'll do the same, and we'll get the most M&M's. You can have some of mine after."
So the daughter pushes 'C', Mom pushes 'D', swallows all 5 M&M's, and with a full mouth says "Let that be a lesson! You can't trust anybody!"
So the daughter pushes 'C', Mom pushes 'D', swallows all 5 M&M's, and with a full mouth says "Let that be a lesson! You can't trust anybody!"
I have seen various variations of this story, some told firsthand. In every case I have concluded that they are just bad parents. They aren't clever. They aren't deep. They are incompetent and banal. Even if parents try as hard as they can to be fair, just and reliable they still fall short of that standard enough for children to be aware of that they can't be completely trusted. Moreover children are exposed to other children and other adults and so are able to learn to distinguish people they trust from people that they don't. Adding the parent to the untrusted list achieves little benefit.
I'd like to hear the follow up to this 'funny' story. Where the daughter updates on the untrustworthiness of the parent and the meaninglessness of her word. She then proceeds to completely ignore the mother's commands, preferences and even her threats. The mother destroyed a valuable resource (the ability to communicate via 'cheap' verbal signals) for the gain of a brief period of feeling smug superiority. The daughter (potentially) realis...
I see this discussion over the last several months bouncing around, teasingly close to a coherent resolution of the ostensible subjective/objective dichotomy applied to ethical decision-making. As a perhaps pertinent meta-observation, my initial sentence may promulgate the confusion with its expeditious wording of "applied to ethical decision-making" rather than a more accurate phrasing such as "applied to decision-making assessed as increasingly ethical over increasing context."
Those who in the current thread refer to the essential el...
Allan Crossman: Only if they believe that their decision somehow causes the other to make the same decision.
No line of causality from one to the other is required.
If a computer finds that (2^3021377)-1 is prime, it can also conclude that an identical computer a light year away will do the same. This doesn't mean one computation caused the other.
The decisions of perfectly rational optimization processes are just as deterministic.
@Allan Crossman,
Eliezer's example is set up in such a way that, regardless of what the paperclip maximizer does, defecting gains one billion lives and loses two paperclips.
This same claim can be made about the standard prisoner's dilemma. In the standard version, I still cooperate because, even if this challenge won't be repeated, it's embedded in a social context for me in which many interactions are solo, but part of the social fabric. (tipping, giving directions to strangers, items left behind in a cafe are examples. I cooperate even though I expect ...
A problem in moving from game-theoretic models to the "real world" is that in the latter we don't always know the other decision maker's payoff matrix, we only know - at best! - his possible strategies. We can only guess at the other's payoffs; albeit fairly well in social context. We are more likely to make a mistake because we have the wrong model for the opponent's payoffs than because we make poor strategic decisions.
Suppose we change this game so that the payoff matrix for the paperclips is chosen from a suitably defined random distribution. How will that change your decision whether to "cooperate" or to "defect"?
By the way:
Human: "What do you care about 3 paperclips? Haven't you made trillions already? That's like a rounding error!" Paperclip Maximizer: "How can you talk about paperclips like that?"
PM: "What do you care about a billion human algorithm continuities? You've got virtually the same one in billions of others! And you'll even be able to embed the algorithm in machines one day!" H: "How can you talk about human lives that way?"
Tom Crispin: The utility-theoretic answer would be that all of the randomness can be wrapped up into a single number, taking account not merely of the expected value in money units but such things as the player's attitude to risk, which depends on the scatter of the distribution. It can also wrap up a player's ignorance (modelled as prior probabilities) about the other player's utility function.
For that to be useful, though, you have to be a utility-theoretic decision-maker in possession of a prior distribution over other people's decision-making processes...
Chris: Sorry Allan, that you won't be able to reply. But you did raise the question before bowing out...
I didn't bow out, I just had a lot of comments made recently. :)
I don't like the idea that we should cooperate if it cooperates. No, we should defect if it cooperates. There are benefits and no costs to defecting.
But if there are reasons for the other to have habits that are formed by similar forces
In light of what I just wrote, I don't see that it matters; but anyway, I wouldn't expect a paperclip maximizer to have habits so ingrained that it can't ever...
Allan: There are benefits and no costs to defecting.
This is the same error as in the Newcomb's problem: there is in fact a cost. In case of prisoner's dilemma, you are penalized by ending up with (D,D) instead of better (C,C) for deciding to defect, and in the case of Newcomb's problem you are penalized by having only $1000 instead of $1,000,000 for deciding to take both boxes.
Vladimir: In case of prisoner's dilemma, you are penalized by ending up with (D,D) instead of better (C,C) for deciding to defect
Only if you have reason to believe that the other player will do whatever you do. While that's the case in Simpleton's example, it's not the case in Eliezer's.
Interesting. There's a paradox involving a game in which players successively take a single coin from a large pile of coins. At any time a player may choose instead to take two coins, at which point the game ends and all further coins are lost. You can prove by induction that if both players are perfectly selfish, they will take two coins on their first move, no matter how large the pile is.
I'm pretty sure this proof only works if the coins are denominated in utilons.
It's really about the iteration. I would continually cooperate with the paper clip maximizer if I had good reason to believe it would not defect. For instance, if I knew that Eliezer Yudkowsky without morals and with a great urge for paperclip creation was the paperclip maximizer, I would cooperate. Assuming that you know that playing with the defect button can make you loose 1 billion paperclips from here on, and i know the same for human lives, cooperating seems right. It has the highest expected payoff, if we're using each other's known intentions and plays as evidence about our future plays.
If there is only one trial, and I can't talk to the paper clip maximizer, I will defect.
[Public service announcement]
To any future readers, especially newcomers to LW: yes, Eliezer (with some others) has indeed formulated a solution of sorts for the True One-Shot Prisoner's Dilemma - for some rather specific cases of it, actually, but it was nonetheless very awesome of him. It is a fairly original solution for the field of decision theory (he says), yet it (very roughly) mirrors some religious thought from ages past.
In case you're unfamiliar with idiosyncratic local ideas, it's called "Timeless Decision Theory" - look it up.
[edit]
A while ago I took the time to type up a full copy of the relevant Hofstadter essays: http://www.gwern.net/docs/1985-hofstadter So now you have no excuse!
Cooperate. I am not playing against just this one guy, but any future PD opponents. Hope the maximizer lives in a universe where it has to worry about this same calculus. It will defect if it is already the biggest bad in its universe.
If there were a way I could communicate with it (e.g. it speaks english) I'd cooperate with it...not because I feel it deserves my cooperation, but because this is the only way I could obtain its cooperation. Otherwise I'd defect, as I'm pretty sure no amount of TDT would correlate its behavior with mine. Also, why are 4 billion humans infected if only 3 billion at most can be saved in the entire matrix? Eliezer, what are you planning...?
That's a good way to clearly demonstrate a nonempathic actor in the Prisoner's Dilemma; a "Hawk", who views their own payoffs and only their own payoffs as having value and placing no value to the payoffs of others.
But I don't think it's necessary. I would say that humans can visualize a nonempathic human - a bad guy - more easily than they can visualize an empathic human with slightly different motives. We've undoubtedly had to, collectively, deal with a lot of them throughout history.
A while back I was writing a paper and came across a fascinat...
Long time lurker, first post.
Isn't the rational choice on a True Prisoner's Dilemma to defect if possible, and to seek a method to bind the opponent to cooperate even if that binding forces one to cooperate as well? An analogous situation is law enforcement-one may well desire to unilaterally break the law, yet favor the existance of police that force all parties concerned to obey it. Of course police that will never interfere with one's own behavior would be even better, but this is usually impractical. Timeless Decision Theory adds that one should coo...
I really love this blog. What if we were to "exponentiate" this game for billions of players? Which outcome would be the "best" one?
Hi there, I'm new here and this is an old post but I have a question regarding the AI playing a prisoner dilemma against us, which is : how would this situation be possible? I'm trying to get my head around why the AI would think that our payouts are any different than his payouts, given that we built it, we thought it (some) of our values in a rough way and we asked it to maximize paperclips, which means we like paperclips. Shouldn't the AI think we are on the same team? I mean, we coded it that way and we gave it a task, what process exactly would make t...
Why would you want to choose defect? If both criminals are rationalists that use the same logic than if you chose defect to hope to get a result of (d,c) than the result ends up being (d,d). However if you used the logic of lets choose c because if the other person is using this logic than we won't end up having the result of (d,d).
I would say... defect! If all the computer cares about is sorting pebbles, then they will cooperate, because both results under cooperate have more paperclips. This gives an oppurtunity to defect and get a result of (d,c) which is our favorite result.
You'd want to defect, but you'd also happily trade away your ability to defect to both choose heads, but if you could, then you'd happily pretend to trade away your ability to defect, then actually defect.
We're born with a sense of fairness, honor, empathy, sympathy, and even altruism - the result of our ancestors adapting to play the iterated Prisoner's Dilemma.
The keyword here is *sense*, and there's not a whole lot saying that this sense can't vanishes as easily as it appears. Interpretting a human as a "fair, empathetic, altruistic being" is superficial. The status quo narrative of humanity is a lie/mass delusion, and humanity is a largely psychopathic species covered in a brittle, hard candy shell of altruism and e...
It seems to me that with billions of lives there will be a problem of neglect of scale. (At least I don't feel any feelings about it, for me it's just numbers, so I think the true dilemma is no different from the usual, perhaps it would be better to tell a story about how a particular person suffers)
It occurred to me one day that the standard visualization of the Prisoner's Dilemma is fake.
The core of the Prisoner's Dilemma is this symmetric payoff matrix:
Player 1, and Player 2, can each choose C or D. 1 and 2's utility for the final outcome is given by the first and second number in the pair. For reasons that will become apparent, "C" stands for "cooperate" and D stands for "defect".
Observe that a player in this game (regarding themselves as the first player) has this preference ordering over outcomes: (D, C) > (C, C) > (D, D) > (C, D).
D, it would seem, dominates C: If the other player chooses C, you prefer (D, C) to (C, C); and if the other player chooses D, you prefer (D, D) to (C, D). So you wisely choose D, and as the payoff table is symmetric, the other player likewise chooses D.
If only you'd both been less wise! You both prefer (C, C) to (D, D). That is, you both prefer mutual cooperation to mutual defection.
The Prisoner's Dilemma is one of the great foundational issues in decision theory, and enormous volumes of material have been written about it. Which makes it an audacious assertion of mine, that the usual way of visualizing the Prisoner's Dilemma has a severe flaw, at least if you happen to be human.
The classic visualization of the Prisoner's Dilemma is as follows: you are a criminal, and you and your confederate in crime have both been captured by the authorities.
Independently, without communicating, and without being able to change your mind afterward, you have to decide whether to give testimony against your confederate (D) or remain silent (C).
Both of you, right now, are facing one-year prison sentences; testifying (D) takes one year off your prison sentence, and adds two years to your confederate's sentence.
Or maybe you and some stranger are, only once, and without knowing the other player's history, or finding out who the player was afterward, deciding whether to play C or D, for a payoff in dollars matching the standard chart.
And, oh yes - in the classic visualization you're supposed to pretend that you're entirely selfish, that you don't care about your confederate criminal, or the player in the other room.
It's this last specification that makes the classic visualization, in my view, fake.
You can't avoid hindsight bias by instructing a jury to pretend not to know the real outcome of a set of events. And without a complicated effort backed up by considerable knowledge, a neurologically intact human being cannot pretend to be genuinely, truly selfish.
We're born with a sense of fairness, honor, empathy, sympathy, and even altruism - the result of our ancestors adapting to play the iterated Prisoner's Dilemma. We don't really, truly, absolutely and entirely prefer (D, C) to (C, C), though we may entirely prefer (C, C) to (D, D) and (D, D) to (C, D). The thought of our confederate spending three years in prison, does not entirely fail to move us.
In that locked cell where we play a simple game under the supervision of economic psychologists, we are not entirely and absolutely unsympathetic for the stranger who might cooperate. We aren't entirely happy to think what we might defect and the stranger cooperate, getting five dollars while the stranger gets nothing.
We fixate instinctively on the (C, C) outcome and search for ways to argue that it should be the mutual decision: "How can we ensure mutual cooperation?" is the instinctive thought. Not "How can I trick the other player into playing C while I play D for the maximum payoff?"
For someone with an impulse toward altruism, or honor, or fairness, the Prisoner's Dilemma doesn't really have the critical payoff matrix - whatever the financial payoff to individuals. (C, C) > (D, C), and the key question is whether the other player sees it the same way.
And no, you can't instruct people being initially introduced to game theory to pretend they're completely selfish - any more than you can instruct human beings being introduced to anthropomorphism to pretend they're expected paperclip maximizers.
To construct the True Prisoner's Dilemma, the situation has to be something like this:
Player 1: Human beings, Friendly AI, or other humane intelligence.
Player 2: UnFriendly AI, or an alien that only cares about sorting pebbles.
Let's suppose that four billion human beings - not the whole human species, but a significant part of it - are currently progressing through a fatal disease that can only be cured by substance S.
However, substance S can only be produced by working with a paperclip maximizer from another dimension - substance S can also be used to produce paperclips. The paperclip maximizer only cares about the number of paperclips in its own universe, not in ours, so we can't offer to produce or threaten to destroy paperclips here. We have never interacted with the paperclip maximizer before, and will never interact with it again.
Both humanity and the paperclip maximizer will get a single chance to seize some additional part of substance S for themselves, just before the dimensional nexus collapses; but the seizure process destroys some of substance S.
The payoff matrix is as follows:
I've chosen this payoff matrix to produce a sense of indignation at the thought that the paperclip maximizer wants to trade off billions of human lives against a couple of paperclips. Clearly the paperclip maximizer should just let us have all of substance S; but a paperclip maximizer doesn't do what it should, it just maximizes paperclips.
In this case, we really do prefer the outcome (D, C) to the outcome (C, C), leaving aside the actions that produced it. We would vastly rather live in a universe where 3 billion humans were cured of their disease and no paperclips were produced, rather than sacrifice a billion human lives to produce 2 paperclips. It doesn't seem right to cooperate, in a case like this. It doesn't even seem fair - so great a sacrifice by us, for so little gain by the paperclip maximizer? And let us specify that the paperclip-agent experiences no pain or pleasure - it just outputs actions that steer its universe to contain more paperclips. The paperclip-agent will experience no pleasure at gaining paperclips, no hurt from losing paperclips, and no painful sense of betrayal if we betray it.
What do you do then? Do you cooperate when you really, definitely, truly and absolutely do want the highest reward you can get, and you don't care a tiny bit by comparison about what happens to the other player? When it seems right to defect even if the other player cooperates?
That's what the payoff matrix for the true Prisoner's Dilemma looks like - a situation where (D, C) seems righter than (C, C).
But all the rest of the logic - everything about what happens if both agents think that way, and both agents defect - is the same. For the paperclip maximizer cares as little about human deaths, or human pain, or a human sense of betrayal, as we care about paperclips. Yet we both prefer (C, C) to (D, D).
So if you've ever prided yourself on cooperating in the Prisoner's Dilemma... or questioned the verdict of classical game theory that the "rational" choice is to defect... then what do you say to the True Prisoner's Dilemma above?