Trying to distill why strategy-stealing doesn't work even for consequentialists:
Consider a game between A and B, where at most 1 player can win and:
* U_A(A wins)=3, U_A(B wins)=2, U_A(both lose)=0
* U_B(A wins)=0, U_B(B wins)=3, U_B(both lose)=0
At time 1, A has a button that if pressed, ends the game and gives 40% chance of both players losing and 60% of A winning. A can press, pass, or surrender (giving B the win). At time 2, the button passes to B, who has the same options with "press" giving 60% chance of winning to B. At time 3 if both passed, they each have 50% chance of winning.
Solving this backwards, at time 2, B should press because that gives U=.6x3 vs .5x3 for passing, so at time 1, A should surrender because U_A(press)=.6x3, U_A(pass)=U_A(B presses)=.6x2, U_A(surrender)=2.
In terms of theory, this can be explained by this game violating the unit-sum (mathematically equivalent to zero-sum) assumption of strategy-stealing. It confuses me that it has significant mind-share among AI safety people, e.g. @ryan_greenblatt here, despite the world in general, and technological races in particular, obviously not being zero-sum. See also my failure to "steal" the strategy of investing in AI companies.