[content warning: simulated very hot places; extremely bad Nash equilibria]

(based on a Twitter thread)

Rowan: "If we succeed in making aligned AGI, we should punish those who committed cosmic crimes that decreased the chance of an positive singularity sufficiently."

Neal: "Punishment seems like a bad idea. It's pessimizing another agent's utility function. You could get a pretty bad equilibrium if you're saying agents should be intentionally harming each others' interests, even in restricted cases."

Rowan: "In iterated games, it's correct to defect when others defect against you; that's tit-for-tat."

Neal: "Tit-for-tat doesn't pessimize, though, it simply withholds altruism sometimes. In a given round, all else being equal, defection is individually rational."

Rowan: "Tit-for-tat works even when defection is costly, though."

Neal: "Oh my, I'm not sure if you want to go there. It can get real bad. This is where I pull out the game theory folk theorems."

Rowan: "What are those?"

Neal: "They're theorems about Nash equilibria in iterated games. Suppose players play normal-form game G repeatedly, and are infinitely patient, so they don't care about their positive or negative utilities being moved around in time. Then, a given payoff profile (that is, an assignment of utilities to players) could possibly be the mean utility for each player in the iterated game, if it satisfies two conditions: feasibility, and individual rationality."

Rowan: "What do those mean?"

Neal: "A payoff profile is feasible if it can be produced by some mixture of payoff profiles of the original game G. This is a very logical requirement. The payoff profile could only be the average of the repeated game if it was some mixture of possible outcomes of the original game. If some player always receives between 0 and 1 utility, for example, they can't have an average utility of 2 across the repeated game."

Rowan: "Sure, that's logical."

Neal: "The individual rationality condition, on the other hand, states that each player must get at least as much utility in the profile as they could guarantee getting by min-maxing (that is, picking their strategy assuming other players make things as bad as possible for them, even at their own expense), and at least one player must get strictly more utility."

Rowan: "How does this apply to an iterated game where defection is costly? Doesn't this prove my point?"

Neal: "Well, if defection is costly, it's not clear why you'd worry about anyone defecting in the first place."

Rowan: "Perhaps agents can cooperate or defect, and can also punish the other agent, which is costly to themselves, but even worse for the other agent. Maybe this can help agents incentivize cooperation more effectively."

Neal: "Not really. In an ordinary prisoner's dilemma, the (C, C) utility profile already dominates both agents' min-max utility, which is the (D, D) payoff. So, game theory folk theorems make mutual cooperation a possible Nash equilibrium."

Rowan: "Hmm. It seems like introducing a punishment option makes everyone's min-max utility worse, which makes more bad equilibria possible, without making more good equilibria possible."

Neal: "Yes, you're beginning to see my point that punishment is useless. But, things can get even worse and more absurd."

Rowan: "How so?"

Neal: "Let me show you my latest game theory simulation, which uses state-of-the-art generative AI and reinforcement learning. Don't worry, none of the AIs involved are conscious, at least according to expert consensus."

Neal turns on a TV and types some commands into his laptop. The TV shows 100 prisoners in cages, some of whom are screaming in pain. A mirage effect appears across the landscape, as if the area is very hot.

Rowan: "Wow, that's disturbing, even if they're not conscious."

Neal: "I know, but it gets even worse! Look at one of the cages more closely."

Neal zooms into a single cage. It shows a dial, which selects a value ranging from 30 to 100, specifically 99.

Rowan: "What does the dial control?"

Neal: "The prisoners have control of the temperature in here. Specifically, the temperature in Celsius is the average of the temperature selected by each of the 100 denizens. This is only a hell because they have made it so; if they all set their dial to 30, they'd be enjoying a balmy temperature. And their bodies repair themselves automatically, so there is no release from their suffering."

Rowan: "What? Clearly there is no incentive to turn the dial all the way to 99! If you set it to 30, you'll cool the place down for everyone including yourself."

Neal: "I see that you have not properly understood the folk theorems. Let us assume, for simplicity, that everyone's utility in a given round, which lasts 10 seconds, is the negative of the average temperature. Right now, everyone is getting -99 utility in each round. Clearly, this is feasible, because it's happening. Now, we check if it's individually rational. Each prisoner's min-max payoff is -99.3: they set their temperature dial to 30, and since everyone else is min-maxing against them, everyone else sets their temperature dial to 100, leading to an average temperature of 99.3. And so, the utility profile resulting from everyone setting the dial to 99 is individually rational."

Rowan: "I see how that follows. But this situation still seems absurd. I only learned about game theory folk theorems today, so I don't understand, intuitively, why such a terrible equilibrium could be in everyone's interest to maintain."

Neal: "Well, let's see what happens if I artificially make one of the prisoners select 30 instead of 99."

Neal types some commands into his laptop. The TV screen splits to show two different dials. The one on the left turns to 30; the prisoner attempts to turn it back to 99, but is dismayed at it being stuck. The one on the right remains at 99. That is, until 6 seconds pass, at which point the left dial releases; both prisoners set their dials to 100. Ten more seconds pass, and both prisoners set the dial back to 99.

Neal: "As you can see, both prisoners set the dial to the maximum value for one round. So did everyone else. This more than compensated for the left prisoner setting the dial to 30 for one round, in terms of average temperature. So, as you can see, it was never in the interest of that prisoner to set the dial to 30, which is why they struggled against it."

Rowan: "That just passes the buck, though. Why does everyone set the dial to 100 when someone set it to 30 in a previous round?"

Neal: "The way it works is that, in each round, there's an equilibrium temperature, which starts out at 99. If anyone puts the dial less than the equilibrium temperature in a round, the equilibrium temperature in the next round is 100. Otherwise, the equilibrium temperature in the next round is 99 again. This is a Nash equilibrium because it is never worth deviating from. In the Nash equilibrium, everyone else selects the equilibrium temperature, so by selecting a lower temperature, you cause an increase of the equilibrium temperature in the next round. While you decrease the temperature in this round, it's never worth it, since the higher equilibrium temperature in the next round more than compensates for this decrease."

Rowan: "So, as a singular individual, you can try to decrease the temperature relative to the equilibrium, but others will compensate by increasing the temperature, and they're much more powerful than you in aggregate, so you'll avoid setting the temperature lower than the equilibrium, and so the equilibrium is maintained."

Neal: "Yes, exactly!"

Rowan: "If you've just seen someone else violate the equilibrium, though, shouldn't you rationally expect that they might defect from the equilibrium in the future?"

Neal: "Well, yes. This is a limitation of Nash equilibrium as an analysis tool, if you weren't already convinced it needed revisiting based on this terribly unnecessarily horrible outcome in this situation. Possibly, combining Nash equilibrium with Solomonoff induction might allow agents to learn each others' actual behavioral patterns even when they deviate from the original Nash equilibrium. This gets into some advanced state-of-the-art game theory (12), and the solution isn't worked out yet. But we know there's something wrong with current equilibrium notions."

Rowan: "Well, I'll ponder this. You may have convinced me of the futility of punishment, and the desirability of mercy, with your... hell simulation. That's... wholesome in its own way, even if it's horrifying, and ethically questionable."

Neal: "Well, I appreciate that you absorbed a moral lesson from all this game theory!"

New Comment
102 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Seriously, what? I'm missing something critical. Under the stated rules as I understand them, I don't see why anyone would punish another player for reducing their dial.

You state that 99 is a nash equilibrium, but this just makes no sense to me. Is the key that you're stipulating that everyone must play as though everyone else is out to make it as bad as possible for them? That sounds like an incredibly irrational strategy.

I think it's not that 99 is a Nash equilibrium, it's that everyone doing "Play 99 and, if anyone deviates, play 100 to punish them until they give in" is a Nash equilibrium.  (Those who think they understand the post: am I correct?)

Start by playing 99. If someone played less than they were supposed to last round, you're now supposed to play 100. Otherwise, you're now supposed to play 99.

I think what people are missing (I know I am) is where does the "supposed to" come from?  I totally understand the debt calculation to get altruistic punishment for people who deviate in ways that hurt you - that's just maximizing long-term expectation through short-term loss.  I don't understand WHY a rational agent would punish someone who is BENEFITTING you with their deviant play.

I'd totally get it if you reacted to someone playing MORE than they were supposed to.  But if someone plays less than, there's no debt or harm to punish.

Formally, it's an arbitrary strategy profile that happens to be a Nash equilibrium, since if everyone else plays it, they'll punish if you deviate from it unilaterally.

In terms of more realistic scenarios there are some examples of bad "punishing non punishers" equilibria that people have difficulty escaping. E.g. an equilibrium with honor killings, where parents kill their own children partly because they expect to be punished if they don't. Rober Trivers, an evolutionary psychologist, has studied these equilibria, as they are anomalous from an evolutionary psychology perspective.

This doesn't really answer the question. If some prisoner turns the dial to 30, everyone gets higher utility the next round, with no downside. In order to have some reason to not put it to 30, they need some incentive (e.g. that if anyone puts it strictly below average they also get an electric shock or whatever).
In the round after the round where the 30 applies, the Shelling temperature for the next round increases to 100, and it's a Nash equilibrium for everyone to always select the Schelling temperature. You can claim this is an unrealistic Nash equilibrium but I am pretty sure that unilateral deviation from the Schelling temperature, assuming everyone else always plays the Schelling temperature, never works out in anyone's favor.

If a mathematical model doesn't reflect at all the thing it's supposed to represent, it's not a good model. Saying "this is what the model predicts" isn't helpful.

There is absolutely zero incentive to anyone to put the temperature to 100 at any time. Even as deterrence, there is no reason for the equilibrium temperature to be an unsurvivable 99. It makes no sense, no one gains anything from it, especially if we assume communication between the parties (which is required for there to be deterrence and other such mechanisms in place). There is no reason to punish someone putting the thermostat lower than the equilibrium temperature either, since the lowest possible temperature is still comfortable. The model is honestly just wrong to describe any actual situation of interest.

At the very least, the utility function is wrong: it's not linear in temperature, obviously. It skyrockets around where temperatures exceed the survivable limit and then plateaus. There's essentially no difference between 99 and 99.3, but there's a much stronger incentive to go back below 40 as quickly as possible.

2Martin Randall
I think the mention here of "unsurvivable" temperature misses this point from the simulation description: I agree that the incentives are different if high temperatures are not survivable and/or that there is a release from suffering. In particular the best alternative to a negotiated agreement is probably for me to experience a short period of excruciating pain and then die. This means that any outcome cannot be worse for me than that.
Ah, true. But I still wouldn't expect that the difference between 99 and 99.3 would matter much compared to the possibility of breaking the deadlock and going back to a non-torturous temperature. Essentially, if the equilibrium is 99, the worst that the others can do to you is raise it up to 99.3. Conversely, keeping your temperature at 30 sends a signal that someone is trying to lower it, and if even just another one joins you, you get 89.6. At which point temperature might go even lower if others pick up on the trend. Essentially, as things are presented here, there is no reason why the equilibrium ought to be stable.

"Stable Nash equilibrium" is a term-of-art that I don't think you meant to evoke, but it's true that you can reach better states if multiple people act in concert.  Saying this is a Nash equilibrium only means that no single player can do better, if you assume that everyone else is a robot that is guaranteed to keep following their current strategy no matter what.

This equilibrium is a local maximum surrounded by a tiny moat of even-worse outcomes.  The moat is very thin, and almost everything beyond it is better than this, but you need to pass through the even-worse moat in order to get to anything better.  (And you can't cross the moat unilaterally.)

Of course, it's easy to vary the parameters of this thought-experiment to make the moat wider.  If you set the equilibrium at 98 instead of 99, then you'd need 3 defectors to do better, instead of only 2; etc.

So you can say "this is such an extreme example that I don't expect real humans to actually follow it", but that's only a difference in degree, not a difference in kind.  It's pretty easy to find real-life examples where actual humans are actually stuck in an equilibrium that is strictly worse than some other equilibrium they theoretically could have, because switching would require coordination between a bunch of people at once (not just 2 or 3).

It is, in theory, but I feel like this underrates the real reason for most such situations: actual asymmetries in values, information, or both. A few things that may hold an otherwise pointless taboo or rule in place: 1. it serving as a shibboleth that identifies the in-group. This is a tangible benefit in certain situations. It's true that another could be chosen and it could be something that is also more inherently worthy rather than just conventionally picked, but that requires time and adjusting and may create confusion 2. it being tied to some religious or ideological worldview such that at least some people genuinely believe it's beneficial, and not just a convention. That makes them a lot more resistant to dropping it even if there was an attempt at coordination 3. it having become something that is genuinely unpleasant to drop even at an individual level simply because force of habit has led some individuals to internalize it. In general I think the game theoretical model honestly doesn't represent anything like a real world situation well because it creates a situation that's so abstract and extreme, it's impossible to imagine any of these dynamics at work. Even the worst, most dystopian totalitarianism in which everyone spies on everyone else and everyone's life is miserable will at least have been started by a group of true believers who think this is genuinely a good thing.
I contend examples are easy to find even after you account for all of those things you listed.  If you'd like a more in-depth exploration of this topic, you might be interested in the book Inadequate Equilibria.
I've read Inadequate Equilibria, but that's exactly the thing, this specific example doesn't really convey that sort of situation. At the very least, some social interaction as well as the path to the pathological equilibrium are crucial to it. By stripping down all of that, the 99 C example makes no sense. They're an integral part of why such things happen.
1Closed Limelike Curves
That's correct, but that just makes this a worse (less intuitive) version of the stag hunt.
I'm in the same boat. "...everyone's utility in a given round ... is the negative of the average temperature." Why would we assume that? "Clearly, this is feasible, because it's happening." Is this rational? Isn't this synonymous with saying "clearly my scenario makes sense because my scenario says so"? "Each prisoner's min-max payoff is -99.3" If everyone else is min-maxing against any given individual, you would have a higher payoff if you set your dial to 0, no?  The worst total payoff would be -99. What am I missing? Can anyone bridge this specific gap for me?
"Feasible" is being used as a technical term-of-art, not a value judgment.  It basically translates to "physically possible".  You can't have an equilibrium of 101 because the dials only go up to 100, so 101 is not "feasible". The min-max payoff is -99.3 because the dials only go down to 30, not to 0. We're assuming that utility function because it's a simple thought-experiment meant to illustrate a general principle, and assuming something more complicated would just make the illustration more complicated.  It's part of the premise of the thought-experiment, just like assuming that people are in cages with dials that go from 30 to 100 is part of the premise.

The problem is that the model is so stripped down it doesn't illustrate the principle any more. The principle, as I understand it, is that there are certain "everyone does X" equilibria in which X doesn't have to be useful or even good per se, it's just something everyone's agreed upon. That's true, but only to a certain point. Past a certain degree of utter insanity and masochism, people start solving the coordination problem by reasonably assuming that no one else can actually want X, and may try rebellion. In the thermostat example, a turn in which simply two prisoners rebelled would be enough to get a lower temperature even if the others tried to punish them. At that point the process would snowball. It's only "stable" to the minimum possible perturbation of a single person turning the knob to 30, and deciding it's not worth it any more after one turn at a mere 0.3 C above the already torturous temperature of 99 C.

I'm confused.  Are you saying that the example is bad because the utility function of "everyone wants to minimize the average temperature" is too simplified?  If not, why is this being posted as a reply to this chain?

I think the claim is that, while it may be irrational, it can be a Nash equilibrium. (And sometimes agents are more Nash-rational than really rational.)

7Seth Herd
Are they, though? This strikes me as so far from any real world scenario as to be useless. The only point I can draw from this is that if everyone acts crazy then everyone is acting crazy together. The game theory is irrelevant. Everyone is running a policy that's very much against their own interests. Is the point that their policy to punish makes them vulnerable to a very bad equilibrium? B cause it seems like they are punishing good behavior, and it seems clear why that would have terrible results.
We see plenty of crazy, self-harming behavior in the real world. And plenty of people following local incentive to the their own long-term detriment. And people giving in to threats. And people punishing others, to their own detriment, including punishing what seems like prosocial behavior. And we see plenty of coalitions that punish defectors from the coalition, and punish people who fail to punish defectors. I would hope that the exact scenario in the OP wouldn't literally happen. But "so far from any real world scenario as to be useless" seems very incorrect. (Either way, the game-theoretic point might be conceptually useful.)
Yes, but it's usually because people believe that it does some good, or are locked in an actual prisoner's dilemma in which being the first to cooperate makes you the sucker. Not situations in which defecting produces immediate (if small) benefits to you with no downsides.
2Seth Herd
I can see how that would apply in principle. I'm just saying: wouldn't you want a dramatically more real-world relevant scenario? If you punish good behavior, of course you'll get bad equilibria. Does punishing bad behavior also give bad equilibria? It would be fascinating if it did, but this scenario has nothing to say about that.
What do you mean by "bad" behavior? This has an obvious natural definition in this particular thought-experiment, because every action affects all players in the same way, and the effect of every action is independent of every other action (e.g. changing your dial from 70 to 71 will always raise the average temperature by 0.01, no matter what any other dial is set to).  But that's a very special case.
1Seth Herd
I don't know, but I'd settle for moving to an example of bad effects from punishing behavior that sounds bad in any way at all.
The given example involves punishing behavior that is predicted to lower utility for all players, given the current strategies of all players.  Does that sound bad in any way at all?
1Seth Herd
I guess it doesn't, when you put it that way. I'd just like an example that has more real-world connections. It's hard to see how actual intelligent agents would adopt that particular set of strategies. I suspect there are some real world similarities but this seems like an extreme case that's pretty implausible on the face of it. It is punishing good behavior in the sense that they're punishing players for making things better for everyone on the next turn.
Two comments to this: 1) The scenario described here is a Nash equilibrium but not a subgame-perfect Nash equilibrium. (IE, there are counterfactual parts of the game tree where the players behave "irationally".) Note that subgame-perfection is orthogonal with "reasonable policy to have", so the argument "yeah, clearly the solution is to always require subgame-perfection" does not work. (Why is it orthogonal? First, the example from the post shows a "stupid" policy that isn't subgame-perfect. However, there are cases where subgame-imperfection seems smart, because it ensures that those counterfactual situations don't become reality. EG, humans are somewhat transparent to each other, so having the policy of refusing unfair splits in the Final Offer / Ultimatum game can lead to not being offered unfair splits in the first place.) 2) You could modify the scenario such that the "99 equilibrium" becomes more robust. (EG, suppose the players have a way of paying a bit to punish a specific player a lot. Then you add the norm of turning temperature to 99, the meta-norm of punishing defectors, the meta-meta-norm of punishing those who don't punish defectors, etc. And tadaaaa, you have a pretty robust hell. This is a part of how society actually works, except usually those norms typically enforce pro-social behaviour.)
You may be confusing the questions "starting from a blank slate, would you expect players to go here?" and "given that players are (somehow) already here, would they stay here?"  Saying that something is a Nash equilibrium only implies the second thing. You'd punish a player for setting their dial lower because you expect that this will actually make the temperature higher (on average, in the long-run).  And you expect that it will make the temperature higher because you expect everyone to punish them for it.  This is self-referential, but it's internally-consistent.  It's probably not what a new player would come up with on their own if you suddenly dropped them into this game with no explanation, but if everyone already believes it then it's true. (If you can't immediately see how it's true given that belief, try imagining that you are the only human player in this game and the other 99 players are robots who are programmed to follow this strategy and cannot do otherwise.  Then, what is your best strategy?)
1Seth Herd
You're correct, I was confused in exactly that way. Once that confusion was cleared up by replies, I became confused as to why the hell (ha?) we were talking about this example at all. I am currently of the belief that it's just a bad example and we should talk about a different one, since there have got to be better examples of counterintuitive bad outcomes from more reasonable-sounding punishment strategies.

I think ideas like Nash equilibrium get their importance from predictive power: do they correctly predict what will happen in the real world situation which is modeled by the game. For example, the biological situations that settle on game-theoretic equilibria even though the "players" aren't thinking at all.

In your particular game, saying "Nash equilibrium" doesn't really narrow down what will happen, as there are equilibria for all temperatures from 30 to 99.3. The 99 equilibrium in particular seems pretty brittle: if Alice breaks it unilaterally on round 1, then Bob notices that and joins in on round 2, neither of them end up punished and they get 98.6 from then on.

More generally, in any game like this where everyone's interests are perfectly aligned, I'd expect cooperation to happen. The nastiness of game theory really comes from the fact that some players can benefit by screwing over others. The game in your post doesn't have that, so any nastiness in such a game is probably an analysis artifact.

6Jacob Watts
This seems to be phrased like a disagreement, but I think you're mostly saying things that are addressed in the original post. It is totally fair to say that things wouldn't go down like this if you stuck 100 actual prisoners or mathematicians or whatever into this scenario. I don't believe OP was trying to claim that it would. The point is just that sometimes bad equilibria can form from everyone following simple, seemingly innocuous rules. It is a faithful execution of certain simple strategic approaches, but it is a bad strategy in situations like this because it fails to account for things like modeling the preferences/behavior of other agents. To address your scenario: Ya, sure this could happen "in real life", but the important part is that this solution assumes that Alice breaking the equilibrium on round 1 is evidence that she'll break it on round 2. This is exactly why the character Rowan asks: and it is yields the response that  This is followed by discussion of how we might add mathematical elements to account for predicting the behavior of other agents.  Humans predict the behavior of other agents automatically and would not be likely to get stuck in this particular bad equilibrium. That said, I still think this is an interesting toy example because it's kind of similar to some bad equilibria which humans DO get stuck in (see these comments for example). It would be interesting to learn more about the mathematics and try to pinpoint what makes these failure modes more/less likely to occur.

Reminds me of this from Scott Alexander's Meditations on Moloch:

Imagine a country with two rules: first, every person must spend eight hours a day giving themselves strong electric shocks. Second, if anyone fails to follow a rule (including this one), or speaks out against it, or fails to enforce it, all citizens must unite to kill that person. Suppose these rules were well-enough established by tradition that everyone expected them to be enforced.

Scott Aaronson's

Scott Alexander's

3Mo Putera
I remember when Scott Alexander wrote: "People sometimes confuse me with Scott Aaronson because of our similar-sounding names. I encourage this, because Scott Aaronson is awesome and it can only improve my reputation to be confused with him."
1Keenan Pepper
Hilarious... I fixed my error
Indeed, sounds relevant. Though note that from a technical point of view,  Scott's example arguably fails the "individual rationality" condition, since some people would prefer to die over 8h/day of shocks. (Though presumably you can figure out ways of modifying that thought example to "remedy" that.)

I don't understand the motivation to preserve the min-max value, and perhaps that's why it's a folk theorem rather than an actual theorem.  Each participant knows that they can't unilaterally do better than 99.3, which they get by choosing 30 while the other players all choose 100.  But a player's maxing (of utility; min temperature) doesn't oblige them to correct or reduce their utility (by raising the temperature) just because the opponents fail to minimize the player's utility (by raising the temperature).

There is no debt created anywhere in the model or description of the players.  Everyone min-maxes as a strategy, picking the value that maximizes each player's utility assuming all opponents minimize that player's utility.  But the other players aren't REQUIRED to play maximum cruelty - they're doing the same min-max strategy, but for their own utility, leading everyone to set their dial to 30.  

I believe many of the theorems have known proofs (e.g. this paper). Here's an explanation of the debt mechanic: Debt is initially 0. Equilibrium temperature is 99 if debt is 0, otherwise 100. For everyone who sets the temperature less than equilibrium in a round, debt increases by 1. Debt decreases by 1 per round naturally, unless it was already 0.
Hmm.  A quick reading of that paper talks about punishment for defection, not punishment for unexpected cooperation.   Can you point to the section that discusses the reason for the "debt" concept as applied to deviations that benefit the player in question? Note that I'm going to have to spend a bit more time on the paper, because I'm fascinated by the introduction of a discount rate to make the punishment non-infinite.  I do not expect to find that it mandates punishment for unexpected gifts of utility.
I only skimmed the paper, it was linked from Wikipedia as a citation for one of the folk theorems. But, it's easy to explain the debt-based equilibrium assuming no discount rate. If you're considering setting the temperature lower than the equilibrium, you will immediately get some utility by setting the temperature lower, but you will increase the debt by 1. That means the equilibrium temperature will be 1 degree higher in 1 future round, more than compensating for the lower temperature in this round. So there is no incentive to set the temperature lower than the equilibrium (which is itself determined by debt). The problem with using this as an equilibrium in an iterated game with a discount rate is that if the debt is high enough, the higher temperature due to higher debt might come very late, so the agents care less about it. I'm not sure how to fix this but I strongly believe it's possible.
I understand the debt calculation as group-enforced punishment for defection.  It's a measure of how much punishment is due to bring the average utility of an opponent back down to expectation, after that opponent "steals" utility by defecting when they "should" cooperate.  It's not an actual debt, and not symmetrical around that average.  In fact, in the temperature example, it should be considered NEGATIVE debt for someone to unilaterally set their temperature lower.
Ah, here's a short proof of a folk theorem: * But it doesn't show that it's a subgame perfect equilibrium. This paper claims to prove it for subgame perfect equilibria, although I haven't checked it in detail.
I rewrote part of the post to give an equilibrium that works with a discount rate as well. "The way it works is that, in each round, there's an equilibrium temperature, which starts out at 99. If anyone puts the dial less than the equilibrium temperature in a round, the equilibrium temperature in the next round is 100. Otherwise, the equilibrium temperature in the next round is 99 again. This is a Nash equilibrium because it is never worth deviating from. In the Nash equilibrium, everyone else selects the equilibrium temperature, so by selecting a lower temperature, you cause an increase of the equilibrium temperature in the next round. While you decrease the temperature in this round, it's never worth it, since the higher equilibrium temperature in the next round more than compensates for this decrease."

I am confused. Why does everyone else select the equilibrium temperature? Why would they push it to 100 in the next round? You never explain this.

I understand you may be starting off a theorem that I don’t know. To me the obvious course of action would be something like: the temperature is way too high, so I’ll lower the temperature. Wouldn’t others appreciate that the temperature is dropping and getting closer to their own preference of 30 degrees ?

Are you saying what you’re describing makes sense, or are you saying that what you’re describing is a weird (and meaningless?) consequence of Nash theorem?

I'm saying it's a Nash equilibrium, not that it's particularly realistic. They push it to 100 because they expect everyone else to do so, and they expect that if anyone sets it to less than 100, the equilibrium temperature in the round after that will be 100 instead of 99. If everyone else is going to select 100, it's futile to individually deviate and set the temperature to 30, because that means in the next round everyone but you will set it to 100 again, and that's not worth being able to individually set it to 30 in this round.
Gotcha. Thanks for clarifying! 
After giving it some thought, I do see a lot of real-life situations where you get to such a place. For instance-  I was recently watching The Vow, the documentary about the NXIVM cult ("nexium"). In very broad strokes, one of the core fucked up things the leader does, is to gaslight the members into thinking that pain is good. If you resist him, don't like what he says, etc, there is something deficient in you. After a while, even when he's not in the picture so it would make sense for everyone to suffer less and get some slack, people punish each other for being deficient or weak. And now that I wrote it about NXIVM I imagine those dynamics are actually commonplace in everyday society too.   
I don't really know about the theorem, but I think there's something real here. I think the theorem is in spirit something like: "bad for everyone" equilibria can be enforced, as long as there's a worse possible history. As long as there's a worse possible history that can be enforced by everyone but you, regardless of what you do, then everyone but you can incentivize you to do any particular thing. Like suppose you wake up and everyone tells you: we've all decided we're going to torture everyone forever if you don't pinch everyone you meet; but if you pinch everyone you meet, we won't do that. So then your individually Nash-rational response is to pinch everyone you meet. Ok so that gets one person. But this could apply to everyone. Suppose everyone woke up one day with some weird brain damage such that they have all the same values as before, except that they have a very specific and strong intention that if any single person fails to pinch everyone they meet, then everyone else will coordinate to torture everyone forever. Then everyone's Nash-rational response is to pinch everyone. But how can this be an equilibrium? Why wouldn't everyone just decide to not do this torture thing? Isn't that strictly better? If we're just talking about Nash equilibria, the issue is that it counts as an equilibrium as long as what each player actually does is responding correctly given that everyone else's policy is whatever it is. So even though it's weird for players to harm everyone, it still counts as a Nash equilibrium, as long as everyone actually goes around pinching everyone in response. Pinch everyone is Nash-correct if everyone else would punish you, and indeed everyone else would punish you. Since everyone pinches everyone, no one has to actually torture everyone (which would be Nash-irrational, but that doesn't matter). But why would people have all this Nash-irrational behavior outside of what actually happens? It's not actually necessary. https://en.wikipedia.org
Per Wikipedia, it's called a "folk theorem" because there was a substantial period of time when most people in the field were aware of it but it hadn't been formally published.  It's still an "actual" theorem. It doesn't say this outcome WILL or SHOULD happen; it just says there exists some Nash equilibrium where it happens.

The Wikipedia article has an example that is easier to understand:

Anthropology: in a community where all behavior is well known, and where members of the community know that they will continue to have to deal with each other, then any pattern of behavior (traditions, taboos, etc.) may be sustained by social norms so long as the individuals of the community are better off remaining in the community than they would be leaving the community (the minimax condition).

I'd argue that the tie from "exit vs suffering" to "minimax strategy, mathematically modeled" is fairly tenuous, but there are elements in there that make sense.  This discussion reminds me of https://principiadiscordia.com/book/45.php - the math MUST include a mechanism by which the punishers actually benefit from the equilibrium they're enforcing, and it's a puzzle to solve if there is no such reason but real humans do it anyway.
That mostly works for neutral-ish behaviours. "Never say this word" or "wear this funny hat" or even some more onerous undertakings are not a big cost compared to the advantages of being in a community. The prisoner example is flawed because it replaces the neutral tradition with active torture and the community with a bunch of isolated people that are forced to stay in their cells anyway. It's not like they gain any benefits from sticking with the community - nor like they could leave anyway.

Correct me if I'm wrong:

The equilibrium where everyone follows "set dial to equilibrium temperature" (i.e. "don't violate the taboo, and punish taboo violators") is only a weak Nash equilibrium.

If one person instead follows "set dial to 99" (i.e. "don't violate the taboo unless someone else does, but don't punish taboo violators") then they will do just as well, because the equilibrium temp will still always be 99. That's enough to show that it's only a weak Nash equilibrium.

Note that this is also true if an arbitrary number of people deviate to this strat... (read more)

Nice provocative post :-)

It's good to note that Nash equilibrium is only one game-theoretic solution concept.  It's popular in part because under most circumstances at least one is guaranteed to exist, but folk theorems can cause there to be a lot of them.  In contexts with lots of Nash equilibria, game theorists like to study refinements of Nash equilibrium, i.e., concepts that rule out some of the Nash equilibria.  One relevant refinement for this example is that of strong Nash equilibrium, where no subset of players can beneficially devia... (read more)

I have a suspicion that the viscerally unpleasant nature of this example is making it harder for readers to engage with the math.

I suspect that, in this particular example, it is more about reasoning about subgame-perfection being unintuitive (and the absence of a mechanism for punishing people who don't punish "defectors").
Per my reading of the OP, if someone sets their dial to 99 during a round when you're supposed to set it to 100, then everyone sets it to 100 again the following round.  Does this not count as a mechanism for punishing non-punishers?
Oh, I missed that --- I thought they set it to 100 forever. In that case, I was wrong, and this indeed works as a mechanism for punishing non-punishers, at least from the mathematical point of view. Mathematics aside, I still think the example would be clearer if there were explicit mechanisms for punishing individuals. As it is, the exact mechanism critically relies on details of the example, and on mathematical nitpicks which are unintuitive. If you instead had explicit norms, meta-norms, etc, you would avoid this. (EG, suppose anybody can punish anybody else by 1, for free. And the default is that you don't do it, except that there is the rule for punishing rule-breakers (incl. for this rule).)
I thought the purpose of the example was to demonstrate that you can have a Nash equilibrium that is very close to the worst possible outcome.  What did you think the purpose was, that would be better served by that stuff you listed?
It was partially to demonstrate that bad Nash equilibria even affect common-payoff games, there don't even need to be dynamics of some agents singling out other agents to reward and punish.
I think the purpose is the same thing that you say it is, an example of an equilibrium that is "very close" to the worst possible outcome. But I would additionally prefer if the example did not invoke the reaction that it critically relies on quirky mathematical details. (And I would be fine if this additional requirement came at the cost of the equilibrium being "90% of the way towards worst possible outcome", rather than 99% of the way.)
The cost I'd be concerned about is making the example significantly more complicated. I'm also not sure the unintuitiveness is actually bad in this case.  I think there's value in understanding examples where your intuitions don't work, and I wouldn't want someone to walk away with the mistaken impression that the folk theorems only predict intuitive things.

Example origin scenario of this Nash equilibrium from GPT-4:

In this hypothetical scenario, let's imagine that the prisoners are all part of a research experiment on group dynamics and cooperation. Prisoners come from different factions that have a history of rivalry and distrust. 

Initially, each prisoner sets their dial to 30 degrees Celsius, creating a comfortable environment. However, due to the existing distrust and rivalry, some prisoners suspect that deviations from the norm—whether upward or downward—could be a secret signal from one faction to ... (read more)

This makes a lot of sense, but it also gives sufficient motivation to enforce a standard that the modeling of utility and mention of Nash equilibria becomes secondary.   Trying to formalize it by identifying the the sizes of the coalitions, the utility of preventing this communication/conspiracy channel among the rivals, and the actual communications which let them "decide to adopt a strategy of punishing any deviations from the new 99 degrees Celsius temperature" will likely break it again.  In fact, if there is any coalition of at least 2, they can get a minmax result of 98.6, so presumably won't participate nor enforce 99 as an equilibrium.

Two (related) morals of the story:

  1. A really, really stupid strategy can still meet the requirements for being a Nash equilibrium, because a strategy being a Nash equilibrium only requires that no one player can get a better result for themselves when only that player is allowed to change strategies.

  2. A game can have more than one Nash equilibrium, and a game with more than one Nash equilibrium can have one that's arbitrarily worse than another.

A copy of my comment from the other thread:

The only thing I learned from this post is that, if you use mathematically precise axioms of behavior, then you can derive weird conclusions from game theory. This part is obvious and seems rather uninteresting.

The strong claim, namely the hell scenario, comes from then back-porting the conclusions from this mathematical rigor to our intuitions about a suggested non-rigorous scenario.

But this you cannot do unless you've confirmed that there's a proper correspondence from your axioms to the scenario.

For example, th... (read more)

1Martin Randall
I think this particular assumption (axiom?) isn't required. The agents' utilities can be linear with the temperature for any reason. If an agent is concerned only with their own suffering then U=-T. If an agent is equally concerned with the suffering of all agents then U=-100T. The same set of strategies are Nash equilibria in both cases. This seems right. Also, humans aren't rational. Also, humans don't live forever.

I don't think the 'strategy' used here (set to 99 degrees unless someone defects, then set to 100) satisfies the "individual rationality condition". Sure, when everyone is setting it to 99 degrees, it beats the minmax strategy of choosing 30. But once someone chooses 30, the minmax for everyone else is now to also choose 30 - there's no further punishment that will or could be given. So the behavior described here, where everyone punishes the 30, is worse than minmaxing. At the very least, it would be an unstable equilibrium that would have broken down in the situation described - and knowing that would give everyone an incentive to 'defect' immediately.

After someone chooses 30 once, they still get to choose something different in future rounds. In the strategy profile I claim is a Nash equilibrium, they'll set it to 100 next round like everyone else. If anyone individually deviates from setting it to 100, then the equilibrium temperature in the next round will also be 100. That simply isn't worth it, if you expect to be the only person setting it less than 100. Since in the strategy profile I am constructing everyone does set it to 100, that's the condition we need to check to check whether it's a Nash equilibrium.
I guess the unstated assumption is that the prisoners can only see the temperatures of others from the previous round and/or can only change their temperature at the start of a round (though one tried to do otherwise in the story). Even with that it seems like an awfully precarious equilibrium since if I unilaterally start choosing 30 repeatedly, you'd have to be stupid to not also start choosing 30, and the cost to me is really quite tiny even while no one else ever 'defects' alongside me. It seems to be too weak a definition of 'equilibrium' if it's that easy to break - maybe there's a more realistic definition that excludes this case?
The other thing that could happen is silent deviations, where some players aren't doing "punish any defection from 99" - they are just doing "play 99" to avoid punishments. The one brave soul doesn't know how many of each there are, but can find out when they suddenly go for 30.
1Eric Chen
The 'individual rationality condition' is about the payoffs in equilibrium, not about the strategies. It says that the equilibrium payoff profile must yield to each player at least their minmax payoff. Here, the minmax payoff for a given player is -99.3 (which comes from the player best responding with 30 forever to everyone else setting their dials to 100 forever).  The equilibrium payoff is -99 (which comes from everyone setting their dials to 99 forever). Since -99 > -99.3, the individual rationality condition of the Folk Theorem is satisfied. 
I think the "at least" is an important part of this.  If it yields more than their minimax payoff, either because the opponents are making mistakes, or have different payoffs than you think, or are just cruelly trying to break your model, there's no debt created because there's no cost to recoup. The minimax expectation is 99.3 (the player sets to 30 and everyone else to 100).  One possible bargaining/long-term repeated equilibrium is 99, where everyone chooses 99, and punishes anyone who sets to 100 by setting themselves to 100 for some time.  But it would be just as valid to expect the long-term equilibrium to be 30, and punish anyone who sets to 31 or higher.  I couldn't tell from the paper how much communication was allowed between players, but it seems to assume some mutual knowledge of each other's utility and what a given level of "punishment" achieves. In no case do you need to punish someone who's unilaterally giving you BETTER than your long-term equilibrium expectation.
9Eric Chen
Oh yeah, the Folk Theorem is totally consistent with the Nash equilibrium of the repeated game here being 'everyone plays 30 forever', since the payoff profile '-30 for everyone' is feasible and individually-rational. In fact, this is the unique NE of the stage game and also the unique subgame-perfect NE of any finitely repeated version of the game. To sustain '-30 for everyone forever', I don't even need a punishment for off-equilibrium deviations. The strategy for everyone can just be 'unconditionally play 30 forever' and there is no profitable unilateral deviation for anyone here. The relevant Folk Theorem here just says that any feasible and individually-rational payoff profile in the stage game (i.e. setting dials at a given time) is a Nash equilibrium payoff profile in the infinitely repeated game. Here, that's everything in the interval [-99.3, -30] for a given player. The theorem itself doesn't really help constrain our expectations about which of the possible Nash equilibria will in fact be played in the game. 

In a realistic setting agents will be highly incentivized to seek other forms of punishment besides turning dial. But nice toy hell.


Curated. As always, I'm fond of a good dialog. Also usually (though comes out less often), I'm fond of getting taught interesting game theory results, particularly outside the range of games most discussed, in this case, more focusing on iterated stuff which can get weirder. Kudos, I really like having posts like these on LW.

If it's the case that the game theory here is correct I'm sad why it can't be simply explained as such, if the game theory here isn't correct I'm sad it's curated.
I'm unsure what you mean by "simply explained" and how it's different from this post.  Do you mean you wanted to see a formal proof of the theorem instead of an illustrative example?  Or that you dislike the dialog format?
I wrote a what I believe to be simpler explanation of this post here. Things I tried to do differently:  1. More clearly explaining what Nash equilibrium means for infinitely repeated games -- it's a little subtle, and if you go into it just with intuition, it's not clear why the "everyone puts 99" situation can be a Nash equilibrium  2. Noting that just because something is a Nash equilibrium doesn't mean it's what the game is going to converge to  3. Less emphasis on minimax stuff (it's just boilerplate, not really the main point of folk theorems) 

by min-maxing (that is, picking their strategy assuming other players make things as bad as possible for them, even at their own expense) [bold and italics added]

Why would anyone assume this and make decisions based on it?  I have never understood this aspect of the Nash equilibrium. (Edit: never mind, I thought it was claimed that this was part of how Nash equilibria worked, and assumed this was the thing I remembered not understanding about Nash equilibria, but that seems wrong)

It's not. The original Nash construction is that player N picks a strategy that maximizes thier utility, assuming all other players get to know what N picked, and then pick a strategy that maximizes thier own utility given that. Minimax as a goal is only valid for atomic game actions, not complex strategies - Specifically because of this "trap"
Ok, let's see.  Wikipedia: This is sensible. Then... from the Twitter thread: This seems incorrect.  The Wiki definition of Nash equilibrium posits a scenario where the other players' strategies are fixed, and player N chooses the strategy that yields his best payoff given that; not a scenario where, if player N alters his strategy, everyone else responds by changing their strategy to "hurt player N as hard as possible".  The Wiki definition of Nash equilibrium doesn't seem to mention minimax at all, in fact (except in "see also"). In this case, it seems that everyone's starting strategy is in fact something like "play 99, and if anyone plays differently, hurt them as hard as possible".  So something resembling minimax is part of the setup, but isn't part of what defines a Nash equilibrium.  (Right?) Looking more at the definitions... The "individual rationality" criterion seems properly understood as "one very weak criterion that obviously any sane equilibrium must satisfy" (the logic being "If it is the case that I can do better with another strategy even if everyone else then retaliates by going berserk and hurting me as hard as possible, then super-obviously this is not a sane equilibrium"). It is not a definition of what is rational for an individual to do.  It's a necessary but nowhere near sufficient condition; if your decisionmaking process passes this particular test, then congratulations, you're maybe 0.1% (metaphorically speaking) on the way towards proving yourself "rational" by any reasonable sense of the word. Does that seem correct?
This is specifically for Nash equilibria of iterated games. See the folk theorems Wikipedia article.
1[comment deleted]

Any Nash Equilibrium can be a local optimum. This example merely demonstrates that not all local optima are desirable if you are able to view the game from a broader context. Incidentally, evolution has provided us with some means to try and get out of these local optima. Usually by breaking the rules of the game or leaving the game or seemingly not acting rationally from the perspective of the local optimum.

Just to clarify, the complete equilibrium strategy alluded to here is:

"Play 99 and, if anyone deviates from any part of the strategy, play 100 to punish them until they give in"

Importantly, this includes deviations from the punishment. If you don't join the punishment, you'll get punished. That makes it rational to play 99 and punish deviators.

The point of the Folk Theorems are that the Nash Equilibrium notion has limited predictive power in repeated games like this, because essentially any payoff could be implemented as a similar Nash equilibrium. That do... (read more)

I don’t think this works. Here is my strategy:

  1. I set the dial to 30 and don’t change it no matter what.
  2. In the second round, temperature lowers to 98.3.
  3. In the third round all the silly people except me push the dial to 100 and thus we get 99.3.
  4. I don’t deviate from 30, no matter how many rounds of 99.3 be there.
  5. At some point someone figures out that I am not going to deviate and they defect too, now we have 88.6.
  6. The avalanche has started. It won’t take long to get to 30.

Now, this was a perfectly rational course of action for me. I knew that I will suffer temporarily, but in exchange I got a comfortable temperature for eternity.

Prove me wrong.

Looking ahead multiple moves seems sufficient to break the equilibrium, but for the started assumption that the other players also have deeply flawed models of your behavior that assume you're using a different strategy - the shared one including punishment. There does seem to be something fishy/circular about baking an assumption about other players strategy into the player's own strategy and omitting any ability to update.
Count me in : ) If we assume that there are at least two not-completely-irrational agents, you are right. And in case there aren't, I don't think the scenario qualifies as a "game" theory. It's just a boring personal hell with 99 unconscious zombies. But given the negligible effect of punishment, I'd rather keep my dial at 30 just to keep the hope alive, than surrendering to the "policy".
0Philip Niewold
Of course it is perfectly rational to do so, but only from a wider context. From the context of the equilibrium it isn't. The rationality your example is found because you are able to adjudicate your lifetime and the game is given in 10 second intervals. Suppose you don't know how long you have to live, or, in fact, now that you only have 30 seconden more to live. What would you choose? This information is not given by the game, even though it impacts the decision, since the given game does rely on real-world equivalency to give it weight and impact. 
2Samuel Hapák
I am quite confused what the statement actually is. I don’t buy the argument about game ending in 30 seconds. The article quite clearly implies that it will last forever. If we are not playing a repeated game here, then none of this makes senses and all the (rational) players would turn the knob immediately to 30. You can induct from the last move to prove that. If we are playing a finite game that has a probability p of ending in any given turn, it shouldn’t change much either. I also don’t understand the argument about “context of equilibrium”. I guess it would be helpful to formalize the statement you are trying to state.

I can see why feasibility + individual rationality makes a payoff profile more likely than any profile missing one of these conditions, but I can’t see why I should consider every profile satisfying these conditions as likely enough to be worth worrying about


This is about as convincing as a scarecrow. Maybe I'm commiting some kinda taboo not trying to handle a strawman like a a real person but to me the mental exercise of trying faulty thought experiments is damaging when you can just point to the fallacy and move on.

I'd be interested in someone trying to create the proposed simulation without the presumptive biases. Would AI models given only pain and relief with such a dial to set themselves come to such equilibrium? I think obviously not. I don't hear anyone arguing that's wrong just that it's a misundersta... (read more)

Quick mod note: I generally prefer not to "tone police" or similar and if a user is saying something substantive, forgive a little tartness in how they speak. That said, I'm leery of that when it's off the bat in a user's first submissions. Consider this me frowning a little at you. (Raemon notes that his recollection is that the author of this post in particular dislikes tone policing (not exactly the write term, but good enough), but I'm still leaving this comment to maintain norms for the site a whole, even if Jessicata might not mind.)
Enjoy the echo chamber. I won't bother you people, I'll just unsubscribe from the newsletter, it hasn't been very good lately. Been a subscriber to that for years, this was the first post bad enough to get me to respond. If you don't like that I'm "tart" then you're soft and soft is not the position to be in for dealing with reason and logic it's a very hard thing and often you come to conclusions you don't like.
2the gears to ascension
This is a forum. People just post stuff. Sometimes it's good. Sometimes it's written like your comments. Are you mostly objecting to discussion of abstract game theory? Or are you objecting to discussion of abstract game theory without making explicit that abstract game theory is not exactly the same as real life game theory? that's a common issue I've seen criticized and agree has been a problem for anyone who interacts with game theory, and the criticisms I've seen sound like your comment - taking issue with the abstractness. If so, it's fair to point out this is abstract and not fully grounded. Game theory often uses mathematical objects we call "games" which are formalized, isolated versions of things that happen in real life, and which do indeed have strange divergences from real life as a result of the way they get isolated; properly mapping out how the abstract game differs is often in fact a chore. it seems reasonable to request that they be more clearly marked. But none of that really changes that you were, in fact, rather rude. Shrug. It ain't an echo chamber if your criticism isn't going to be deleted :)
I'll be explicitly clear I referenced straw man because I am taking fault with a logical fallacy. I'm taking fault with the mode/structure of presentation more then the mode of reasoning [be it theoretical or practical]. It was presented as a conversation, feeling very much like someone had just created a straw man to do battle with so that they could communicate something that they had already preconceived... Now if it was just a post here on the form somewhere off in space that I didn't have to think about I would have never cared but I am a newsletter subscriber and it was the newsletter for that cycle. That is the only content I get from this website and I expected it to be well curated. It was not and now I'm seriously considering unsubscribing from the newsletter. I am only here to feel out the community and determine if or not it is worth my time to continue reading a newsletter that is gone down in quality reliably in the last months.
2the gears to ascension
Alright, fair enough. I'm curious what other resources you subscribe to - are there any you'd recommend as being higher quality? I doubt the mods will penalize you for giving other useful ai newsletters or paper feeds or etc. I'm curious what sort of research you typically seek out.
The number one greatest resource I could possibly recommend at this exact moment is LinkedIn. 6 to 8 months ago that was not the truth today it is I don't know how long it'll still be the truth but if you get "Linked In" to companies that are dealing with whatever stem or probably any other thing you're interested in and the employees of those companies tend to share all kinds of interesting material. Obviously some sources much better than others but it's like a source of sources, like this is a forum so it's a source of sources. That's probably the ideal thing to look for going forward should time, as it does, change circumstances.