Explaining “Hell is Game Theory Folk Theorems”

Compare with this from Meditations on Moloch:

Imagine a country with two rules: first, every person must spend eight hours a day giving themselves strong electric shocks. Second, if anyone fails to follow a rule (including this one), or speaks out against it, or fails to enforce it, all citizens must unite to kill that person. Suppose these rules were well-enough established by tradition that everyone expected them to be enforced. So you shock yourself for eight hours a day, because you know if you don’t everyone else will kill you, because if they don’t, everyone else will kill them, and so on.

Seems to me a key component here, which flows naturally from "punish any deviation from the profile" is this pattern of 'punishment of non-punishers'.

[-]Max H2y1111

I found this explanation much clearer than the original post. Well done!

Everyone plays 99, unless someone defects [by the way this could be by playing 30, an action which literally helps everyone], in which case we play 100

This is indeed a very odd / unrealistic-seeming strategy profile. Consider a different, still weird but slightly more natural strategy profile:

Everyone plays 99, unless someone defects [which is defined as playing something higher than 99], in which case we play 100

This is not a Nash equilibrium, since any player can now do better by unilaterally lowering their own number. If the players start with this strategy profile, they probably quickly settle into the more natural Nash equilibrium of all playing 30.

It's useful to remember that a Nash equilibrium is an equilibrium. Various real systems:

may not always be at equilibrium
may be nudged away from a particular equilibrium by a very small nudge, and then settle into a radically different equilibrium pretty quickly thereafter.

[-]Seth Herd2y72

To paraphrase my comment on that confusing post: what the Hell?

The example appears to show that, if everyone punishes good behavior, you'll get bad behavior. This seems pretty obvious when stated like that. What was the point being made beyond that, if anything?

[-]electroswing2y30

I'm not sure what the author intended, but my best guess is they wanted to say "punishment is bad because there exist really bad equilibria which use punishment, by folk theorems". Some evidence from the post (emphasis mine):

Rowan: "If we succeed in making aligned AGI, we should punish those who committed cosmic crimes that decreased the chance of an positive singularity sufficiently."
Neal: "Punishment seems like a bad idea. It's pessimizing another agent's utility function. You could get a pretty bad equilibrium if you're saying agents should be intentionally harming each others' interests, even in restricted cases."

[...]

Rowan: "Well, I'll ponder this. You may have convinced me of the futility of punishment, and the desirability of mercy, with your... hell simulation. That's... wholesome in its own way, even if it's horrifying, and ethically questionable."

Folk theorems guarantee the existence of equilibria for both good (31) and bad (99) payoffs for players, both via punishment. For this reason I view them as neutral: they say lots of equilibria exist, but not which ones are going to happen.

I guess if you are super concerned about bad equilibria, then you could take a stance against punishment, because then it would be harder/impossible for the everyone-plays-99 equilibrium to form. This could have been the original point of the post but I am not sure.

[-]Seth Herd2y31

That's right, I think that was the original point. But this example seems to be a bad one for making that point, because it's punishing pro-social behavior. If you could show how punishing antisocial, defecting behavior had bad consequences, that would be interesting.

[-]Martin Randall2y30

The solution for infinite iterated prisoners dilemma given in the opening post is an example of potential bad consequences for punishing anti-social defecting behavior. A single defection at any point in the infinitely long game causes both players to get the worst outcome in the limit.

If both players are perfectly rational and error-free then this is not fatal. However, a better strategy with reduced punishment and some mercy gets better outcomes in scenarios where those assumptions don't hold.

[-]Seth Herd2y30

I agree, that's a way better example because that type of punishment sounds like a potentially good strategy on the face of it.

[-]Dagon2y41

This is much clearer, thank you. I appreciate the inclusion of

so many Nash equilibria really demonstrates how limited the concept of Nash equilibrium can be.

Because it reinforces the danger of naively applying toy problems to real-world interactions. It doesn't require much communication to form a coalition to break the less-pleasant equilibria, in favor of better ones. The insight that the solutions are likely to be Nash equilibria is a good one, but the constraints that keep it a static one, rather than cycling or finding better ones, are hard to find in reality.

[-]Dacyn2y31

The number 99 isn’t unique—this works with any payoff between 30 and 100.

Actually, it only works with payoffs below 99.3 -- this is the payoff you get by setting the dial to 30 every round while everyone else sets their dials to 100, so any Nash equilibrium must beat that. This was mentioned in jessicata's original post.

Incidentally, this feature prevents the example from being a subgame perfect Nash equilibrium -- once someone defects by setting the dial to 30, there's no incentive to "punish" them for it, and any attempt to create such an incentive via a "punish non-punishers" rule would run into the trouble that punishment is only effective up to the 99.3 limit.

[-]MondSemmel2y30

The only thing I learned from that original post is that, if you use mathematically precise axioms of behavior, then you can derive weird conclusions from game theory. This part is obvious and seems rather uninteresting.

The strong claim from the original post, namely the hell scenario, comes from then back-porting the conclusions from this mathematical rigor to our intuitions about a suggested non-rigorous scenario.

But this you cannot do unless you've confirmed that there's a proper correspondence from your axioms to the scenario.

For example, the scenario suggested that humans are imprisoned; but the axioms require unwilling, unchanging, inflexible robots. It suggests that your entire utility derives from your personal suffering from the temperature, so these suffering entities cannot be empathetic humans. Each robot is supposed to assume that, if it changes its strategy, none of the others does the same.

And so on. Once we've gone through the implications of all these axioms, we're still left with a scenario that seems bad for those inside it. But it doesn't deserve to be called hell, nor to appeal to our intuitions about how unpleasant this would be to personally experience. Because if humans were in those cages, that strategy profile absolutely would not be stable. If the math says otherwise, then the unrealistic axioms are at fault.

[-]vmatyi2y30

Why is this strategy profile a Nash equilibrium? Because no player is better off deviating from this strategy profile, assuming all other players stick to the strategy profile.

Only if we limit thinking to 1 round at a time. But thinking longer term: anyone changing to 30 (end being punished for it), immediately changed the landscape for everyone else: punishment reached it's maximum, from now on everyones' incentive is to lower theirs for immediate reward. And knowing that and thinking ahead at least two rounds, players are better off deviating, thus they would. Or at least they should.

[-]Ericf2y30

The definition of Nash equilibrium is that you assume all other players will stay with thier strategy. If, as in this case, that assumption does not hold then you have (I guess) an "unstable" equilibrium.

[-]localdeity2y20

The infinitely repeated Hell game having so many Nash equilibria really demonstrates how limited the concept of Nash equilibrium can be. How the hell would the strategy profile “Everyone plays 99, unless someone defects [by the way this could be by playing 30, an action which literally helps everyone], in which case we play 100” arise in the real world? The answer is… that’s a good question.

Indeed. Like, how could you know that other people are planning to respond to you playing 30 by "playing 100 until you go back to playing 99"? It's unlikely that you, and many of the other players, would have found this out through experience, given that the strategy doesn't dominate unless almost all the players are already playing it. And if there was a phase where players announced their strategies, why would this be the strategy they all announced? Seems to only be likely if the situation really is Hell, i.e. being ruled by a sadistic authoritarian who chose this strategy and forced everyone to agree to it (which raises a question of whether he's still in control of the situation and how voluntary these choices are).

The most plausible real-world paths that come to mind are those in which the effects of people's actions used to be completely different—where playing 99 was a good thing (because, I dunno, the place was initially freezing) (but I guess where 100 was a bad thing), so people did want to encourage everyone to play it, and that's why they settled on the "play 99 or else be punished" strategy—and then it slowly morphed into the current situation, and I guess people were unable to communicate about it in the meantime.

Heavily-punished taboos, where you're punished for even talking about, for actions that were believed to be bad in ancient times, but which e.g. science has shown to be harmless and beneficial today—that kind of thing would be potentially relevant. (Scott wrote a story that is somewhat related.)

[-]aphyer2y20

For example, here’s a Nash equilibrium: “Everyone agrees to put 99 each round. Whenever someone deviates from 99 (for example to put 30), punish them by putting 100 for the rest of eternity.”

I don't think this is actually a Nash equilibrium? It is dominated by the strategy "put 99 every round. Whenever someone deviates from 99, put 30 for the rest of eternity."

The original post I believe solved this by instead having the equilibrium be “Everyone agrees to put 99 each round. Whenever someone deviates from 99 (for example to put 30), punish them by putting 100 for the next 2 rounds”, which I think is a Nash equilibrium because the punishment being finite means that you're incentivized to stick with the algo even after punishment occurs.

[-]Yair Halberstadt2y20

It's not dominated - holding all other players constant the two strategies have equal payoffs, so neither dominates the other.

[-]electroswing2y*20

The strategy profile I describe is where each person has the following strategy (call it "Strategy A"):

If empty history, play 99
If history consists only of 99s from all other people, play 99
If any other player's history contains a choice which is not 99, play 100

The strategy profile you are describing is the following (call it "Strategy B"):

If empty history, play 99
If history consists only of 99s from all other people, play 99
If any other player's history contains a choice which is not 99, play 30

I agree Strategy B weakly dominates Strategy A. However, saying "everyone playing Strategy A forms a Nash equilibrium" just means that no player has a profitable deviation assuming everyone else continues to play Strategy A. Strategy B isn't a profitable deviation -- if you switch to Strategy B and everyone else is playing Strategy A, everyone will still just play 99 for all eternity.

The general name for these kinds of strategies is grim trigger.

[-]Mateusz Bagiński2y10

I must be missing something, so I'd appreciate an explanation.

Say somebody plays 30 and gets punished by everybody else playing 100 forever. The "defector" doesn't gain anything by increasing their temperature from 30, so they stay at 30. Why would any of the punishers continue playing 100 if they would be better off by lowering the temperature and other ounishers couldn't ounish them anyway since the temperature is already permanently fixed at 100?

[-]ProgramCrafter2y10

How the hell would the strategy profile “Everyone plays 99, unless someone defects [by the way this could be by playing 30, an action which literally helps everyone], in which case we play 100” arise in the real world? The answer is… that’s a good question.

Probably this outcome indicates the limited players' knowledge or bounded rationality.

I've run this experiment with computer-simulated RL agents with the following four basic strategies:

minimize (always play 30);
maximize (always play 100);
equilibrium (the Hell profile);
random (play random integer from [30; 100], each with equal probability).

The agent's algorithm is as follows: with probability 7/8 continue running the same strategy as before; with probability 1/8 select the strategy, where strategies having better rewards have higher chances to be selected. The initial strategy is chosen at random.

Ten times out of ten, this RL simulation consistently converged into all agents playing 30 (and not changing into other strategies) in not more than 10000 steps.

[-]FireStormOOO2y30

Interesting follow-up: how long do they take to break out of the bad equilibrium if all start there? How about if we choose a less extreme bad equilibrium (say 80 degrees)?

[-]ProgramCrafter2y10

By less extreme bad equilibrium, do you mean "play 79, until someone defects, and then play 80"? Or "play 80 or 100"?

Here is the Python script I've used: https://gist.github.com/ProgramCrafter/2af6a5b1cde0ff8995b9502f1c502151
To make all agents start from Hell, you need to change line 31 to self.strategy = equilibrium.

[-]rafaelCosman2y11

Culture seems like an object of this type! For better or for worse.

^{^}

Technical note: you might wonder what a player’s utility from this game is. You can’t just give player utility equal to the sum of their payoffs in each of the individual stage games, because 1+1+1+... and 2+2+2+... are both infinity. Instead, each player will earn utility equal to the average of their payoffs in each of the stage games. Or more specifically, the lim inf of partial averages.

LESSWRONG
LW

LESSWRONG
LW

57

Explaining “Hell is Game Theory Folk Theorems”

57

57

1-shot prisoner’s dilemma.

Nash equilibrium.

n-shot prisoner’s dilemma.

the Hell example.

Conclusion/TLDR.