Pause from Behind / Losing Heroically

Epistemic Status: I think this argument is robust as long as “the conditions” are true. However, I may have miscalculated and left out a condition. I may also be missing something else entirely. Feedback on the post would be helpful in either case.

Introduction

If you want to survive, pause from behind. This can also be described as “Losing Heroically”.

Imagine a simplified model of a race to ASI between two actors. The two actors can each choose to continue racing ahead, or pause and obtain some X percentage-point reduction of takeover risk. Assume that once the actor in front crosses some finish line, they either achieve complete control over the world, or their ASI goes rogue and kills everyone, with the probability of the second outcome decreasing with pause length.

Given certain (possibly attainable) conditions which I list below, the game-theoretic optimal move is for the laggard in the race to “pause from behind” (Lose Heroically) or to stop racing to ASI and credibly signal the pause to the leader. Under another set of (possibly attainable) conditions, the optimal response of the leader is then to pause as well, until either the laggard resumes racing or takeover risk is reduced as much as is perceived to be possible.

The intuition for why Losing Heroically is the optimal move for the laggard is this: if the laggard cannot win the race to ASI no matter what, they might as well lose with a lower chance of takeover risk. And then the optimal response from the leader is symmetric: if they are able to win no matter what, they might as well win with a lower chance of takeover risk.

The Conditions

Two notes before I talk about the conditions.

I am listing the conditions in their absolute form, as if they are either 100% met or 100% unmet. In reality, some (or all) of these conditions may be only probabilistically met, as in, “there is a W% chance that condition Z is true”. It may still be optimal to Lose Heroically, or to respond with a pause, if the probabilities are high enough.
I am also assuming that both actors have either perfect information or an accurately calibrated estimate of their conditions (the other actor’s conditions do not need to be known). In reality, two other things may happen. The first is a false positive: an actor believes their conditions are met, when in reality their conditions are not met. This leads them to employ pausing from behind / pausing in response even when it's not personally optimal for them to do so. By sheer luck, this gets us free takeover risk reduction. The false negative is simply the opposite.

The “laggard’s conditions” for pausing from behind being optimal are as follows:

The relative positions of the two actors is such that there is no possibility of the laggard catching up and reaching ASI first as long as both actors are maintaining current speeds^[1].
The leader will not slow down out of caution, at least not enough for the laggard to overtake.
The utility function of the laggard, and the treatment of the laggard and the rest of humanity by the leader once they win, are such that the laggard prefers surviving and losing to takeover^[2].
There exists a critical threshold of capabilities that provides a decisive advantage such that the actor that gets to it first will control the world. Basically, there is no consolation prize that would entice the laggard to keep racing^[3].

Taken together, these conditions mean that the option facing the laggard is either lose with high probability of takeover risk, or lose with low probability of takeover risk.

Importantly, these conditions do not include:

The laggard actually believing the leader will pause. If there is even some small probability the leader will pause, pausing from behind still reduces takeover risk by P(leader pauses) * X percentage-point reduction in takeover risk from the leader pausing.
The laggard believing that the risk of AI takeover is high, or even the laggard believing that pausing will reduce this risk substantially. Under the four strict conditions above, the bundle of “lose in ASI race and die with Y-X%, X > 0” is always preferred to “lose in ASI race and die with Y%”, no matter the values of Y or X.

Now, under what conditions is it optimal for the leader to pause in response? There “leader’s conditions” are:

The laggard has paused and the leader will know if the laggard resumes.
Under the current paradigm, the laggard cannot catch up if they choose to resume, so long as the leader resumes if the laggard does^[4].
No paradigm shift will occur during the pause that will make it possible (even vanishingly so) for the laggard to catch up if both actors choose to resume.

Here as well, these conditions, taken together, imply that the choice facing the leader is to either win with high probability of takeover risk, or win with low probability of takeover risk.

Symmetrically to the laggard’s conditions, these conditions do not include:

The leader believing takeover risk to be high, or the leader believing that pausing will reduce risk substantially. The logic here is the same as for the laggard.
The leader anticipating that the laggard will remain paused. If they resume, the leader can always resume as well.

The leader’s pause duration (absent the laggard resuming) is basically a function of how many months they are willing to trade away for a perceived X percentage-point reduction in takeover risk^[5].

One can also imagine this with N actors, the N-1 actors that are not in the lead only need to all choose to pause from behind. The conditions may be much harder to achieve, however.

Meeting the Conditions / Avoiding False Negatives

These conditions, and the actors’ perceptions of them, can be altered. For one, we can take actions to make the conditions for either actor more true. Another strategy could be debunking false negatives through sharing better information with the actors. This, however, might be indistinguishable (from the actors’ POV) from promoting false positives through lying to the actors^[6]. Therefore, the strategy of debunking false negatives must rely on hard-to-fake evidence to work.

As for altering the true probabilities of the conditions themselves, let’s examine what is possible. Laggard Conditions 1 and 2 can be made more true by racing aggressively. Laggard Condition 3 can be made more true by the leader being one who intends to treat everyone kindly if they win. Laggard Condition 4 is essentially a dealt hand, although def/acc might be able to weaken this condition. Overall, it seems that attempting to increase the true probabilities of Laggard Conditions 1, 2, and 4, while it leads to a greater chance of the laggard Losing Heroically, leads to a much more dangerous world otherwise. However, if the world naturally makes these conditions very true, then increasing the probabilities of the other conditions (Laggard Condition 3 and most of the leader’s conditions), may be a good strategy for reducing takeover risk^[7].

Leader Condition 1’s probability can be most readily increased by improving information about the laggard’s activities. This can either be the result of voluntary information offered up by the laggard in combination with them pausing from behind, or it can be the result of better surveillance of the laggard by the leader and its allies. Leader Condition 2’s probability can either be improved by the leader racing more (which makes the world more dangerous) or by handicapping the laggard. The laggard may handicap themselves voluntarily in combination with Losing Heroically, or the leader and its allies may attempt to handicap the laggard through, for example, chip restrictions. Leader Condition 3 is very similar to Leader Condition 2, and its probability can be improved in the same ways: racing, voluntary handicapping, or externally-imposed handicapping.

Overall, I wanted to write this post because I think this is a plausibly good strategy in the more desperate, scary worlds which might occur in the future (which will probably satisfy Laggard Conditions 1, 2, and 4). I don’t recommend pursuing this strategy unless we know that’s the case, as it could make things worse. Further discussion of whether the conditions will likely become true or not would be much appreciated.

My substack: https://enterthewoods.substack.com/

^{^}
This seems to be a common assumption when people talk about "lead time", although I imagine reality is a bit more chaotic and laggards who are not far behind could catch up with some amount of luck.
^{^}
You can imagine a case where the laggard thinks that the leader is so evil that AI takeover would be preferable to the leader winning. Therefore, the optimal strategy is to race as much as possible to maximize takeover risk. The laggard, of course, might be correct. Sometimes I fear humans more than ASI.
^{^}
"A consolation prize" could occur in a paradigm that is some combination of slow-takeoff and defense-favoring. Having a "second-rate" ASI could be extremely beneficial in these cases, so pausing from behind is not an optimal strategy.
^{^}
Upon pausing, the leader may realize that the safety situation is much more dire than they initially thought, and they might be extremely reluctant to continue even if the laggard (who might remain ignorant) continues. However, as long as the leader does not anticipate this happening, this does not lower the probability that this condition is fulfilled. In fact, the leader physically cannot anticipate this happening if they are in any way rational, because of Conservation of Expected Evidence.
^{^}
I think that even if I thought that takeover risk was pretty low (say 1-2%), I would at least be willing to pause for a few months to get that risk much closer to 0%. Perhaps I have a bad mental model of others' psyches, however.
I also expect that if takeover risk is very high, pausing for a few months and doing some safety checks might make this fact clear to the leader's people, causing a longer pause.
^{^}
I doubt that this strategy will actually work, considering that the actors will likely look for hard-to-fake evidence. It may also backfire, decreasing the actors' trust and making it harder to debunk false negatives in the future.
^{^}
In these worlds, concentration of power risk is already very high (near maximal). Improving some of the leader's conditions increases concentration of power risk. But if this risk is already at its limit, there is not much downside to doing so.

LESSWRONG
LW

LESSWRONG
LW

-1

Pause from Behind / Losing Heroically

-1

-1