I've repeatedly gotten the impression that many people (on LW and elsewhere) think Game Theory is entirely selfish. This can lead to questions like, "How come so many people cooperate in one-shot prisoner's dilemmas in real life? Maybe there is something we can learn here?"

I think all of this is quite (if not completely) wrong.

Why is it wrong?

In a nutshell, it's wrong because the utilities in a payout matrix measure whatever the players value, not what is in their selfish interest. Game Theory is altruistic exactly insofar as the player's utility functions are altruistic.

Take a public park as an example. It seems to give us the following matrix,[1] which is, of course, just a prisoner's dilemma:

I'd guess that most people would not throw their garbage around, and I'd further guess that this remains true if we restrict it to just tourists who never expect to visit the park again, thus closing the door to cooperation from repetition. That seems to imply that they all choose to cooperate in a prisoner's dilemma: they decline to take a personal gain that would harm others. Is that an accurate description of what's going on?

If you think the answer is "yes", we have very different models of the average person. It seems to me like most people carefully tend to a self-image of a proper citizen who abides by reasonable rules. It is important to them that they're the sort of person who doesn't leave garbage lying around.

Therefore, I claim that people don't cooperate in anything; rather, the matrix is simply wrong. For most people, the choice is not to acquire some positive amount of personal utility for some negative amount of non-personal utility; it's to acquire some negative amount of personal utility for some negative amount of non-personal utility, making it a very uninteresting game. Damaging your self-image is simply not worth the comfort of not having to walk to the garbage can.

I think the truth is that almost no one ever cooperates in a true one-shot prisoner's dilemma. If a thousand people go to a park and 10 leave garbage lying around, it's not that those 10 choose to defect; it's that 990 people weren't playing a prisoner's dilemma; 10 people were, and of those 10 people, 100% chose to defect.

A true prisoner's dilemma would be something where [the other person winning] isn't part of your utility function -- which, since preference satisfaction is in a lot of people's utility functions, is very rare; you need something so outrageous that an override module kicks in that declares this particular preference too unreasonable to consider. You know, like trading a billion human lives for two paperclips. Go ask a normal person if they'd consider cooperating there.

Okay, so that's, like, your opinion. But can't I just define the values in my matrices differently?


And I think that makes sense in some contexts. If you want to analyze how altruistic the average person is, it could make sense to create formal games with "selfish" gains as numbers (even though I'm personally skeptical about whether your definition of what counts as selfish will end up making sense).

In such a setting, perhaps it's correct to say that normal people often cooperate in one-shot prisoner's dilemmas, and this may be an important fact for psychology or policymaking. However, as soon as you argue, "maybe this is a hint for formal decision theory?" you have lost the plot. Your "cooperation" relies on properties that do not exist in formal problems. When a Miri person talks about prisoner's dilemmas, they are talking true prisoner's dilemmas. And identical FDT agents can, in fact, cooperate in a true one-shot prisoner's dilemma! It doesn't matter how egregious the tradeoff seems, and this mechanism is qualitatively different from anything that relies on altruism.

For similar reasons, I also suspect it is incorrect to claim that many normal people implement something like FDT. I think almost everyone implements CDT or EDT, and the "FDT-like behaviors" are just emergent properties of people wanting to be proper citizens.

What if I don't care about AI stuff?

As I said, in some contexts, it may be okay to describe the park thing as a prisoner's dilemma. I'd still point out that a Nash Equilibrium is supposed to describe the thing that happens in the real world. This will be more true if your numbers mean "whatever players value" than if they mean anything else. The analysis of your games becomes more meaningful if it's done like this.

In the end, just know what you're doing. Make your conclusions consistent with your model.

Okay, but is "whatever players really value" actually a real thing?

Maybe not. I don't know. Beyond the scope of this post, really. But human values sure seem to exist to some approximation, and I have a hard time imagining any approximation that makes the main points of this post incorrect.

  1. You may object that one person leaving garbage around wouldn't have such a big impact on any other person. This is true. The thing is, a lot of real-life prisoner-dilemma-type games are such that the players can acquire small personal wins for large socialized losses. In those cases, the real matrix has a lot more than two players, with every player having the option to trade utility for themselves for utility to the other players, with . The result, however, is exactly the same as in the two-player case: the only equilibrium is on "everyone defects", and this is also the only result that's not Pareto-optimal.

    (You may separately object that public places do get dirty, but that's because a small number of people doing the bad thing are enough to cause the bad outcome, and even they probably take some effort. If everyone made literally 0 effort, those places would deteriorate far more quickly.) ↩︎


2 comments, sorted by Click to highlight new comments since: Today at 7:15 PM
New Comment

That's a valid model at first glance. But I feel you may need to put in more work to prove your model is superior to the current one.

Basically if an agent consists of {utility function + decision procedure}, you wish to put this stuff in the utility function, they wish to put it in the decision procedure.

To identify which model is right, we need a way to test that. One way is to check against the average human. The other is to start thinking about ideal humans.

Average humans are often inconsistent, and hence not utility maximisers anyway (for any utility function). So the question becomes which of the two models is the least bad one, rather than which one is good. You need a way to measure differences (between models and reality) and compare them. This way could be intuitions, it could be alignment-style theoretical math (that is based on intuitions but has rigour as a north star), it could be work in neuroscience, it could be psychological survey data and so on.

Ideal humans (/ ideal rational agents) is more complicated to discuss - this discussion can good a bunch of different directions depending what you mean when you use the word "ideal".

[+][comment deleted]4mo 0