From the title, I thought you were going to talk about thermodynamic game theory, where an agent's policy is a softmax of their reward. At absolute zero, agents can get stuck in hellish Nash equilibria, but a milder temperature can help them escape to better equilibrium. Actually, Ellison showed that with lower temperatures you spend exponentially more time in better equilibria, it just also takes exponentially longer to first reach them.
What do the thermodynamics of hell look like? Let me layout the problem in my own notation. Suppose:
If a previously cooperative soul defects, and not enough others defect, then the temperature increases by
more likely to cooperate than defect. The lower the inverse-temperature, the more likely each one is to defect. The probability enough souls defect so that the system collapses is
The social/political ramifications are:
This also seems to be the primary philosophy of conservatism. "The equilibrium is/was already good, and even if it was not, we want to conserve the current system because high-temperature actions are radical and scary." I should also add that the second clause gains more credence when you realize that the free energy
gets smaller as the action-temperature increases, so you lose many of the best equilibria. Who is to know if that doesn't also include the current one?
Milder temperature makes a hell stable
The hell of Hell is game theory folk theorems is not robust.
To recap: in an iterated game 100 agents choose a number between 30 and 100 and for the next 10 seconds they all experience the temperature equal to the average of their chosen number in Celsius (without getting damaged). Now it is declared that:
Since all other agents seem to follow the equilibrium it’s not in the interest of any individual agent to set temperature lower than 99. Even if it does set it at 30 others will set it to 100 and they’ll end up with the temperature of 99.3. Worse than if they picked 99.
But let's consider an agent decides to just set 30 and disregard whatever the other agents are doing. Now the penalty is saturated for all the other agents too. So each of them could set the equilibrium value of 100 and the temperature would be 99.3C in the next round. Or any one of them could set 30 instead and… they won't get punished any more than if they set 100. But if they set 30 they get a lower temperature from their own choice. So all agents pick 30. And everyone is merely uncomfortably hot instead of boiling. Much better!
So we can fix this particular hell with some more reasoning within game theory.
However it’s possible to set up a more robust hell at “cost” of them being milder in a robust state. The original one breaks because a single agent can saturate the penalty. And when that happens other agents are free to make the “prosocial” choice.
This suggests a solution. You can make your hell robust to m < 30[1] agents deciding to set 30 anyway by setting the following:
This way the penalty doesn’t get saturated until at least m agents decide to pick 30 whatever everyone else is doing.
Harder to escape. But I suspect it can be done with some decision theory (but that requires some more knowledge about the other agents).
[1] If we have only 70 agents cooperating then them turning the dial up by 1 moves average temperature up by 0.7C. Which exactly balances a single agent changing dial from 100 to 30. So somewhere around m=30 this breaks and you need to start introducing penalties in bigger steps. Since this is just an illustration I’m skipping working this out exactly.