I'm just a layperson with a layperson's exposure to game theory problems, and I've long been really confused about how to think about them. This is the story of my confusion, and of how I think about utilities in game theory problems now. Very plausibly none of this is novel.
I was first introduced to the Prisoner's Dilemma in my ninth grade history class, and I spent a decade afterwards being fairly baffled. Here was the setup:
Our class of 30 was divided into 10 groups of 3 people. Each group was given an opaque envelope with a pink notecard and a green notecard in it.
If all 10 groups pulled the green card, everyone got one Hershey's Kiss. If some people pulled pink and some pulled green, those who pulled pink would each get two Hershey's Kisses, and those who pulled green would get nothing.
I held the envelope for my group.
On the first round, maybe 8 out of 10 groups pulled green, and then were crestfallen and resentful when they didn't get any candy. The next time, every single group except mine pulled pink. Next round, same thing. And the next round. And the next. Until finally one of my teammates wrestled the envelope away from me and pulled pink, and then no one got any candy.
My teacher singled me out at the end of the exercise and asked me why I chose to cooperate, and I gave some confused milquetoast answer about it being the right thing to do. But truthfully, I didn't know, and I've been thinking about that question ever since. I've gone through half a dozen different interpretations over the years, but I think I finally understand, and it's not complicated at all.
I now believe the reason it was so easy for me to cooperate was not that I was deeply, fundamentally altruistic – I didn't, as I once thought might be the case, calculate that the total utility of cooperating was higher because, while I and my two teammates would be denied candy, our other 27 classmates would get candy, and it was better to have 90% of the class get candy than 0% of the class, even if I was in the 10% of non-candy-getters. No, it was easy because I didn't care about the chocolate. I got far more actual utility from signaling my virtuous altruism to all of my classmates (not to mention myself), than I ever could from a short-lived piece of mass-produced candy.
I've since played at least one other Prisoner's Dilemma game – this one for team points rather than candy – and I cooperated that time as well. In that situation, we were very explicitly groomed to feel empathy for our partner (by doing the classic '36 questions to fall in love' and then staring into each other's eyes for five minutes), which I think majorly interferes with the utility calculations. Eliezer's The True Prisoner's Dilemma is a good read on this topic – he offers a scenario that removes the feeling of empathy for the other player from the equation – but I find that the scenario doesn't hit me right in the intuitions, because it's not that easy to imagine playing against an agent with a totally alien values system.
I also once talked to someone who was in the middle of playing an Iterated Prisoner's Dilemma, and he told me, "If the stakes of a Prisoner's Dilemma are pure utility, then you should defect every time." I think he was pretty obviously wrong, but it still got me thinking.
Even after my behavior in my ninth grade Prisoner's Dilemma started to make sense to me, I remained super confused about Newcomb's Problem. Here's the problem, for anyone who needs a refresher:
A superintelligence from another galaxy, whom we shall call Omega, comes to Earth and sets about playing a strange little game. Omega selects a human being, sets down two boxes in front of them, and flies away.
Box A is transparent and contains a thousand dollars.
Box B is opaque, and contains either a million dollars, or nothing.
You can take both boxes, or take only box B.
And the twist is that Omega has put a million dollars in box B if and only if Omega has predicted that you will take only box B.
Omega has been correct on each of 100 observed occasions so far—everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars. (We assume that box A vanishes in a puff of smoke if you take only box B; no one else can take box A afterward.)
Before you make your choice, Omega has flown off and moved on to its next game. Box B is already empty or already full.
Omega drops two boxes on the ground in front of you and flies off.
Do you take both boxes, or only box B?
For a long time, it seemed so ridiculously obvious to me that you should one-box – but this intuition was, I think, deeply grounded in the conventional framing of the problem. The issue is the substitution of money for pure utility! The diminishing marginal utility of money means that $1,001,000 is not that much more valuable to me than $1,000,000, and indeed, given human inability to grasp large numbers, the difference barely registers.
It wasn't until I started to think of utilities in terms of things that actually matter to me that Newcomb's Problem started to look difficult. Instead of money, I started to think of utilities in terms of people that I love. Instead of gaining dollars, I am gaining human lives – saving them and indefinitely extending them to participate in the colonization of the galaxy.
The possible reward in Box B is not a million dollars, but a thousand human lives. Box B contains nearly everyone I've ever known: hundreds of rationalists; my mom and dad and extended family; all of my friends from high school and college; all of my current and former coworkers.
Box A contains my sister.
That's a much harder decision.
Yes, of course 1000 lives > 1 life. Yes, I still think it makes game-theoretical sense to one-box. But by choosing that option I am leaving something genuinely valuable on the table – not a measly, fungible $1000, but something truly precious that can never be replaced. I would be sorely tempted to two-box if one-boxing meant killing my sister.
(It's worth noting that Eliezer does something like this in The True Prisoner's Dilemma – he imagines curing billions of people of a deadly disease. But because of how humans are, 'a million people vs a billion people' is a lot worse at generating intuitions than 'my sister vs everyone else I've ever known'.)
Was this worth posting? I am genuinely unsure. I know people do game theory stuff all the time and seem to understand it. But I always had trouble understanding it, and no one really explained it to me effectively, and maybe that applies to you too.