Epistemic Status

The math seems pretty clear and simple. This was something I nerd-sniped myself into writing over a relatively short period of time, and there are ways this could be more detailed, and there is room for more in-depth analysis building on this, but I just want to get the basic point written down. The main way that I think this may not be relevantly true is if the probability of lock-in ending in any particular period of time is so low that a lock-in is likely to last until the end of the universe's habitability. I am probably not the first person to come up with the ideas in this post, but I haven't seen them written down explicitly anywhere. I am not knowledgeable about astronomy/astrophysics/cosmology, so there may be factors related to those fields that would meaningfully affect these ideas.

Introduction

One might implicitly assume that if an artificial general intelligence “locks in”^[1] some state of affairs (whether good or bad^[2]), almost all of humanity’s “cosmic endowment” will be dedicated to this state.^[3] However, if a powerful agent had such a goal, how much of the future light-cone would it actually dedicate to this in expectation? In this post, I sketch out a rough model for thinking about this and outline some considerations relevant to this question.^[4]

One straightforward reason to anticipate that the extent of a locked-in state may be orders of magnitude smaller than the future light-cone is that over any period of time, there will be at least a small likelihood that the state will stop. This may seem like a triviality because this likelihood may be miniscule over mere mortal time-scales, but it will add up over the scales that are at hand, meaning that even a very stable lock-in could be much shorter than the universe’s habitable lifespan. In particular, the expected number of years that a lock-in will persist, Y, conditional on it being locked in in the first place,^[5] is given by:

0<k<1 is the average likelihood that the state will end in any particular year. Some potential reasons for this include but are not limited to:^[6]

A competitor defeats the AI.
Some cosmic event like a supernova knocked it out.
If the locked-in state is a merely instrumental goal for the AI, or is one of several terminal goals, it decides that it can more effectively satisfy its utility function by pursuing other goals instead.^[7]
The agent failed to adequately transfer goals to a successor agent.
Reflection or idealization.

m>>1 is some limit (see footnote^[8]). It ~does not really matter what we set this to because if the limit is infinity, this equation converges to a finite sum, namely (1/k)-1.^[9] And for sufficiently high values of m, the sum will be pretty close to this asymptote.

So, the question that arises is: What is k? One need not appeal to Cromwell’s law or to truisms about impermanence to suggest that k might be large enough to meaningfully limit the duration of a locked-in state.

On the one hand, we might reasonably assume that a maximally-smart AI would have low k. Indeed, there are some sources of “k from catastrophe” that it would try to minimize, such as 1, 2, and 4 above. But even a technologically mature superintelligence will still face irreducible risk due to factors that it cannot predict.^[10]

And even beyond this: Some of the sources of “k from flexibility” are potentially not things that the AI would want to minimize (such as 3 and 5 above). If the AI has other terminal goals from which it can obtain utility more efficiently or the locked-in state is merely a means to some more terminal goal, it should in fact get more willing to end this particular state as it gets smarter. A perhaps relevant analogy here is with humans and factory farming; it may end because technology renders it unnecessary or because humans decide it is against our values. So a maximally smart AI does not necessarily mean maximally low k.^[11]

Implications

Suppose it is the case that in expectation, a lock-in will continue for “only” 100 billion years. This is still quite scary -- for longer than the universe has existed, some AGI will be tiling the universe with some state that humans would consider undesirable -- but much smaller than the deep cosmic time the universe may have left. What are the consequences of this for how we weigh risks?

Reduces emphasis on tail risks: Suppose we are comparing some very bad but unlikely risk, “Tail,” to some less bad risk that is more probable, “Likely”. Suppose that before considering k, the expected disvalue of the former is greater than the latter, $P_{t a i l} D_{t a i l} > P_{l i k e l y} D_{l i k e l y}$ (where P represents the probability of each risk and D represents the amount of disvalue that will be incurred if it is realized), but depending on what value we calculate for Y for Tail, this potentially caps $D_{t a i l}$ such that Likely may now be the larger source of expected disvalue. The accumulating chance of lock-in ending also matters for $D_{l i k e l y}$ , but not necessarily equally. Because of the shape of the curve, it puts more of a cap on states that we might otherwise have expected to last for very long times, and it may be the case that the disvalue of Tail mostly exists because we would otherwise have expected it to last for an arbitrarily long time, whereas Likely may not rely on long time-scales. This may matter, for example, for prioritizing between incidental and agential suffering-risks.^[12]

Increases importance of low-k scenarios: It also may mean that situations that are likely to have low k (and the locked-in state is undesirable) are more serious than other comparable risks. Some examples of this may be:

If the AI has only one goal (so is less likely to want to trade in ways that would reduce the state that maximizes this goal)
If the state was locked in in a way that is robust but also in some sense irrational (according to the AI’s own goals) so that it would not be able to end the lock in even though it would be best for its overall utility.^[13]

Of course, this partially is in tension with the prior point about making tail risks matter less, since low-k scenarios may be tail risks. I would guess this matters more than the prior point, since scenarios where taking k into consideration makes Likely more disvaluable than Tail when it otherwise would not have been, seem more like edge cases than these low-k scenarios.

Complications

Infinite (dis)value in finite time: The biggest way I can think of in which this analysis would fail would be if an infinite amount of computation can be created in finite time.^[14] This would mean that even if the expected duration of a particular state is finite, its expected utility or disutility is still infinite. However, to paraphrase Obi-Wan Kenobi, I am not brave enough for infinities, so I will not attempt to resolve this complication here.^[15]

Physical growth: It is probably not the case that the expected (dis)value from a locked-in state (conditional on it continuing to exist) is uniform over time. In fact, it will probably increase as the AI spreads throughout space. Fortunately, the expected limit in time will also put a limit on this, because the AI will be able to colonize an area no more than Y light-years in radius (assuming no faster-than-light travel). Therefore, the expected (dis)value of a state is ~proportional to the following.

$V = \sum_{n = 1}^{m} (1 - k)^{n} \frac{4}{3} π s n^{3}$

This gives the expected volume V of a sphere controlled by an agent in cubic light years. 0<s<1 represents the average fraction of the speed of light at which the agent is expanding.

Fortunately, V also converges to a finite value, $\frac{4 π s (1 - k) (s (1 - k)^{2} + 4 s (1 - k) + 1)}{3 (1 - s (1 - k))^{4}}$ . One could plug this into some equation to determine the amount of compute that could be created or the amount of experience that could be simulated in expectation^[16] for a given k, but I leave this to future work.^[17]

Changes in k: k will not be uniform over time. There will probably be a relatively high k at the beginning, when the AI is relatively weak and may have other competitors on Earth; then, as it establishes a singleton and expands into space, there will be a relatively low k. If an AI does stably expand into space, I expect earth-bound multipolarity to be a negligible portion of the whole time it exists, so the overall k will mostly be influenced by the level of stability during unipolar expansion, and this variable will decrease until it hits technological maturity, and start increasing when it runs into other civilizations.^[18]

One factor that may push toward increases in k over time is that larger systems may be harder to coordinate over large distances in space, they may have more points of failure, and it may be harder for the different components to maintain cohesive goals as the system becomes larger. However, a smart enough AI should know this and can choose not to expand if the costs outweigh the benefits, so while this factor may increase k with respect to the AI’s physical size, it should not necessarily do so monotonically in proportion to the length of its existence.^[19]

Over very long periods of time, it will become harder to obtain resources, and cosmological changes will happen to the universe that will make continued existence more difficult. It is possible that if an agent exists for this long, most of the k will come from this factor, such that it would be ~equivalent to model it as having a hard stop at this time, but there is also value in thinking about things in the way I have done in this article to take into account the accumulating possibility of an earlier demise. Also, it is not known how or if future technologies may extend habitability even if it seems unlikely based on our current understanding of the universe.^[20]

Bounded utility function: I have assumed an unbounded utility function, but if our utility function has a bound below or near the amount of (dis)value that we expect to be created in Y years, it does not make much difference, since we did not care about additional (dis)value anyway.

Conclusion

Although we should expect AGIs to be able to lock in states that are very stable, there are disjunctive ways that these states may end, which could add up over long time-scales. Based on this cursory analysis, I updated somewhat towards locked-in states being shorter than I previously thought. However, due to general uncertainty and the fact that lock-ins may have very low k, this update is bounded to only a few orders of magnitude.

^{^}
See, e.g., Finnveden et al. (2023), Karnofsky (2021), Tomasik (2017)
^{^}
I will mostly be looking at this from the perspective of evaluating the badness of locking in states that are negative or meaningless by our lights, although similar considerations could apply to attempts to lock in positive states.
^{^}
Something like the entirety of our future light-cone up until some point when expansion is no longer tenable for cosmological reasons, or at least as much of this volume as the agent is willing to dedicate to the state in question.
^{^}
Note that the question is not (a) how far into the future any kind of life might exist in the universe or (b) how long a particular civilization might survive, but rather (c) the proportion of b that an agent might maintain in a particular locked-in state.
^{^}
I admit that there is some ambiguity around what it means to be “locked in in the first place,” but what I mean is something like “an agent that has a reasonable chance of colonizing large portions of the cosmos initiates some state and sets out to maintain it indefinitely, without any plans for ending it at the time of its initiation.” The point of this condition is to set aside reasons that the state might never come to exist in the first place, such as, say, if AGI is never created. I am using the word lock-in in a way that may be somewhat different to how others use it, in the sense that it could allow for relatively weak lock-ins, such as if a powerful AI starts a process running for a long period of time but can later choose to end it. I am also interested in the states that the universe is being tiled with, and, even if an AGI is able to stably maintain and act towards its goals, the states that maximize those goals may change over time, which may be considered a kind of lock-in ending. As a toy example, imagine we have a paperclip maximizer that is expanding through the lightcone uncontested, and at some point it creates suffering subroutines to aid in its goals (see Althaus & Gloor, 2016; Bostrom, 2014, pp. 169-171; Tomasik, 2017), but then at a later point, it decides it would be more efficient to create happy subroutines instead, but not because its goals changed or it became less able to fulfill them.
^{^}
A more detailed analysis might estimate and model the probabilities of each of these sources of potential change and how they might interact.
^{^}
One could consider the locked-in state to be “Earth-originating civilization being governed by a misaligned AI,” in which case reason 3 is unlikely to apply, but if instead one considers that an AGI may lock in various disvaluable states (e.g. simulations of suffering beings to find out more about the universe) and also instantiate other states that may be more or less disvaluable, these states may change based on context within the lifetime of the AI.
^{^}
If we are risk-neutral, m=∞, provided that we consider at least some probability that the universe and the locked-in state can persist for infinite time. If we ignore the possibility of infinite time, m is the maximum remaining lifetime of the universe, which in any case is very long. Or if we have some threshold t below which we ignore small probabilities, we can approximate m as $l o g_{1 - k} (t) / / 1$ where // represents integer division. (If t>k we would also have to round 1-k up to 1 in the initial terms of the sum for which (1-k)^n>1-t.) Arbitrarily discounting small probabilities may create concerns of exploitable preferences, but there may be rationally consistent reasons to ignore small probabilities due to uncertainty about the background level of utility, according to work by Christian Tarsney, who has estimated a cutoff for discounting low probabilities at about 1 in 1 billion. Some reasonable values of m and k may cut off Y at a value far below convergence, but this is only more to my point that Y may be lower than we might naively assume.
^{^}
So, if k is 1 in 1,000,000, Y is 999,999 years.
^{^}
See, e.g. Lawrence (2016):
“The limit on predictive precision is imposed by the exponential growth in complexity of exact simulation, coupled with the accumulation of error associated with the necessary abstraction of our predictive models. As we predict forward[,] these uncertainties can saturate dominating our predictions. As a result we often only have a very vague notion of what is to come. This limit on our predictive ability places a fundamental limit on our ability to make intelligent decisions.”
Note that although Lawrence raised this issue as an argument for skepticism against AI risk, one can recognize the existence of a fundamental limit of intelligence while also expecting this limit to be high enough to allow for takeover. See also here and Russell (2019, p. 232).
^{^}
It remains hard to guess at what k might be, but in this footnote I present some vague thoughts.
We might guess that something like 1 in 1 trillion is a reasonable lower bound for k. This is somewhat of a wild guess based on the idea that AIs might be orders of magnitude better at preventing catastrophes than humans, but a more robust way to estimate this might involve looking at rates of cosmic events and considering how likely these are to wipe out an AI civilization at various sizes, but this would still be pretty speculative due to unknowns about whether literal galaxy brains will be able to predict/prevent/defend against gamma ray bursts and the like.
An upper bound might be something like 1 in 100,000. Computer, telephone, and electrical systems aim for 99.999% reliability, although this is measured by the amount of time that the system will be down temporarily in a year, not the relevant apples-to-apples comparison, which is the likelihood that, in a given year, it will “permanently” stop working. (One would guess that under normal circumstances, this is much lower than 1 in 100,000, but this does not prove that telephone systems are able to survive the kinds of changes that would happen over 100,000 years -- and probably they won’t. I intend this to be just a ballpark for the vague “level of reliability” that we know is achievable, but maybe it is not relevant to compare day-to-day reliability vs. long-term survival.) Surely if humans can achieve this kind of stability, a superintelligent AI can be more reliable, but also it will be operating in very different circumstances.
As it is said, it is hard to make predictions, especially about the future, so these upper and lower bounds are something like a 50% confidence interval.
^{^}
This is all based on the assumption that the worst risks are those that we would originally have expected to last for a very long time, and that these risks are also tail risks.
^{^}
It is possible that agents that irrationally lock in a state that they later disprefer may also not be as good at locking in states robustly, however.
^{^}
See Dyson (1979), Wilkinson (2021, footnote 2), Tomasik (2006)
^{^}
But see generally e.g. Bostrom (2011)
^{^}
E.g. Bostrom (2013, pp. 101-103)
^{^}
This estimate would be imperfect because the density of space is not uniform. However, over long enough timescales, it should average out. Another factor this model does not account for is that the state may not end all at once, but rather the AI could devote less resources to it, or the AI could stop expanding. There is also the question of cosmic expansion, which will render distant galaxies unreachable after a certain point in time, although agents could continue to persist within a given spatial bound. See generally Ord (2021). A related complication is that cosmological expansion or other changes may mean that locking in a bad state now may be particularly bad due to opportunity costs that may be unlikely to be recouped in the future, (for example, from some moral perspectives, a lock-in that consumes ~all of the time during which new civilizations could have evolved may be worse than one that leaves some time for a new civilization to evolve, in ways that are not proportionate to the sheer amount of time) but I will leave this to future work.
^{^}
I make no claim as to whether technological maturity or extraterrestrial contact will happen first, but see Cook (2022) for estimates of the latter.
^{^}
Of course, the fact that it will tend to be successful at maximizing its utility function does not necessarily mean it will tend to maximize its lifespan (suppose the AI is a risk-neutral expected-paperclip maximizer and sees an opportunity to risk its existence with 10% probability but get a 0.1% chance at 10^100 paperclips, which it expects to be 10,000 times larger than the amount it would get by continuing to exist for certain), but we should expect a correlation here due to the whole instrumental convergence thing.
^{^}
See Dyson (1979): "Supposing that we discover the universe to be naturally closed and doomed to collapse, is it conceivable that by intelligent intervention, converting matter into radiation and causing energy to flow purposefully on a cosmic scale, we could break open a closed universe and change the topology of space-time so that only a part of it would collapse and another part of it would expand forever? I do not know the answer to this question. If it turns out that the universe is closed, we shall still have about 10^10 years to explore the possibility of a technological fix that would burst it open."

LESSWRONG
LW