Epistemic Status: I think this argument is robust as long as “the conditions” are true. However, I may have miscalculated and left out a condition. I may also be missing something else entirely. Feedback on the post would be helpful in either case.
Introduction
If you want to survive, pause from behind. This can also be described as “Losing Heroically”.
Imagine a simplified model of a race to ASI between two actors. The two actors can each choose to continue racing ahead, or pause and obtain some X percentage-point reduction of takeover risk. Assume that once the actor in front crosses some finish line, they either achieve complete control over the world, or their ASI goes rogue and kills everyone, with the probability of the second outcome decreasing with pause length.
Given certain (possibly attainable) conditions which I list below, the game-theoretic optimal move is for the laggard in the race to “pause from behind” (Lose Heroically) or to stop racing to ASI and credibly signal the pause to the leader. Under another set of (possibly attainable) conditions, the optimal response of the leader is then to pause as well, until either the laggard resumes racing or takeover risk is reduced as much as is perceived to be possible.
The intuition for why Losing Heroically is the optimal move for the laggard is this: if the laggard cannot win the race to ASI no matter what, they might as well lose with a lower chance of takeover risk. And then the optimal response from the leader is symmetric: if they are able to win no matter what, they might as well win with a lower chance of takeover risk.
The Conditions
Two notes before I talk about the conditions.
The “laggard’s conditions” for pausing from behind being optimal are as follows:
Taken together, these conditions mean that the option facing the laggard is either lose with high probability of takeover risk, or lose with low probability of takeover risk.
Importantly, these conditions do not include:
Now, under what conditions is it optimal for the leader to pause in response? There “leader’s conditions” are:
Here as well, these conditions, taken together, imply that the choice facing the leader is to either win with high probability of takeover risk, or win with low probability of takeover risk.
Symmetrically to the laggard’s conditions, these conditions do not include:
The leader’s pause duration (absent the laggard resuming) is basically a function of how many months they are willing to trade away for a perceived X percentage-point reduction in takeover risk[5].
One can also imagine this with N actors, the N-1 actors that are not in the lead only need to all choose to pause from behind. The conditions may be much harder to achieve, however.
Meeting the Conditions / Avoiding False Negatives
These conditions, and the actors’ perceptions of them, can be altered. For one, we can take actions to make the conditions for either actor more true. Another strategy could be debunking false negatives through sharing better information with the actors. This, however, might be indistinguishable (from the actors’ POV) from promoting false positives through lying to the actors[6]. Therefore, the strategy of debunking false negatives must rely on hard-to-fake evidence to work.
As for altering the true probabilities of the conditions themselves, let’s examine what is possible. Laggard Conditions 1 and 2 can be made more true by racing aggressively. Laggard Condition 3 can be made more true by the leader being one who intends to treat everyone kindly if they win. Laggard Condition 4 is essentially a dealt hand, although def/acc might be able to weaken this condition. Overall, it seems that attempting to increase the true probabilities of Laggard Conditions 1, 2, and 4, while it leads to a greater chance of the laggard Losing Heroically, leads to a much more dangerous world otherwise. However, if the world naturally makes these conditions very true, then increasing the probabilities of the other conditions (Laggard Condition 3 and most of the leader’s conditions), may be a good strategy for reducing takeover risk[7].
Leader Condition 1’s probability can be most readily increased by improving information about the laggard’s activities. This can either be the result of voluntary information offered up by the laggard in combination with them pausing from behind, or it can be the result of better surveillance of the laggard by the leader and its allies. Leader Condition 2’s probability can either be improved by the leader racing more (which makes the world more dangerous) or by handicapping the laggard. The laggard may handicap themselves voluntarily in combination with Losing Heroically, or the leader and its allies may attempt to handicap the laggard through, for example, chip restrictions. Leader Condition 3 is very similar to Leader Condition 2, and its probability can be improved in the same ways: racing, voluntary handicapping, or externally-imposed handicapping.
Overall, I wanted to write this post because I think this is a plausibly good strategy in the more desperate, scary worlds which might occur in the future (which will probably satisfy Laggard Conditions 1, 2, and 4). I don’t recommend pursuing this strategy unless we know that’s the case, as it could make things worse. Further discussion of whether the conditions will likely become true or not would be much appreciated.
My substack: https://enterthewoods.substack.com/
This seems to be a common assumption when people talk about "lead time", although I imagine reality is a bit more chaotic and laggards who are not far behind could catch up with some amount of luck.
You can imagine a case where the laggard thinks that the leader is so evil that AI takeover would be preferable to the leader winning. Therefore, the optimal strategy is to race as much as possible to maximize takeover risk. The laggard, of course, might be correct. Sometimes I fear humans more than ASI.
"A consolation prize" could occur in a paradigm that is some combination of slow-takeoff and defense-favoring. Having a "second-rate" ASI could be extremely beneficial in these cases, so pausing from behind is not an optimal strategy.
Upon pausing, the leader may realize that the safety situation is much more dire than they initially thought, and they might be extremely reluctant to continue even if the laggard (who might remain ignorant) continues. However, as long as the leader does not anticipate this happening, this does not lower the probability that this condition is fulfilled. In fact, the leader physically cannot anticipate this happening if they are in any way rational, because of Conservation of Expected Evidence.
I think that even if I thought that takeover risk was pretty low (say 1-2%), I would at least be willing to pause for a few months to get that risk much closer to 0%. Perhaps I have a bad mental model of others' psyches, however.
I also expect that if takeover risk is very high, pausing for a few months and doing some safety checks might make this fact clear to the leader's people, causing a longer pause.
I doubt that this strategy will actually work, considering that the actors will likely look for hard-to-fake evidence. It may also backfire, decreasing the actors' trust and making it harder to debunk false negatives in the future.
In these worlds, concentration of power risk is already very high (near maximal). Improving some of the leader's conditions increases concentration of power risk. But if this risk is already at its limit, there is not much downside to doing so.