(3) the agents who do spend energy fighting it are systematically outcompeted by those who do not, which means the system's ability to fight it degrades over time even if some agents start out fighting it.
Civilization consists of some mixture of local myopic competition, and global organized agency.
Maximally local competition is bacteria. Each individual cell fighting for resources. And no cell can do anything else, or it will be outcompeted.
Maximally global organization is a singleton. An ASI or world government that can easily squash any subsystem that gets out of line.
The world is neither extreme, and fragments of both patterns can be found.
Look at the global moratorium on CFC's. That's one of those large scale, long term global problems, with a not-that-expensive solution. And it was met with a large scale coordinated response.
Combining these yields a conclusion: we should expect to live in a minimally friendly universe, not a maximally or even an average-friendly one. The anthropic principle guarantees that our universe clears the bar for producing observers. The Copernican principle says we are typical among observer-containing universes. Since there are vastly more ways for a universe to barely clear the bar than to be friendly for humans on all levels and in all parts of the configuration space, typical means "barely clearing the bar."
Except, industrial civilization. The set of universes in which an industrial high tech civilization can propagate itself seems to be larger than the set of universes in which homonids can evolve. The set of universes in which self replicating nanotech can spread through the lightcone is FAR larger than the set of universes where the hunter gatherers have access to the right kind of flint and tasty large herbivores.
Also, the idea that there are "vastly more ways to barely clear the bar than to be friendly" sounds like a complicated and nontrivial assumption. It might be true, but there is no obvious-to-me reason why it MUST be true.
fundamental physical constants (but this is an illustration, I do not want to create impression that the argument is only about fundamental physics parameters fine-tuning
I am not convinced by fine tuning. I don't think we know enough physics to know whether or not nuclear-physics life exists on the surface of neutron stars in our universe. Let alone in some other universe. I expect that most of the universes with different constants have a complexity similar to that of this universe.
in a multi-dimensional parameter space where the viable region is a tiny sliver, most of the volume of that sliver is near its boundaries, not deep in its interior.
As a property of high dimensional spaces, Yes. However, being at the edge of that sliver doesn't automatically mean we are "barely surviving" and risk extinction. We could be in a situation where, if the strong force was just marginally weaker, stars couldn't exist. (And suppose for a moment that life without stars is impossible) That puts us near the boundaries of survivable-parameter space, but it doesn't mean we are risking extinction.
Also, I'm not actually convinced that the viable region is a tiny sliver.
Making everything go right is hard. Making something go wrong is easy. This is also, at root, an observation about the relative sizes of state spaces: the states in which a complex system continues to function are vastly outnumbered by the states in which it doesn't, in the same way that the configurations of a watch that tell time are vastly outnumbered by the configurations that don't.
The thing is, mosquitos are also a complex system. Applying this logic, it should be really really easy to wipe them all out.
The question for civilizational survival is therefore: is there a specific, powerful mechanism that keeps civilization within the narrow band of survival-compatible states?
Ability for humans to self replicate and rebuild. Active adaptions, both evolved and intelligent. Decisions and actions, both individual and coordinated.
The problem is that for existential threats, the feedback loops are, generally, not tight at all.
Every human starving to death because we just randomly decide we don't want to eat, despite having food available. This is, in a sense, an existential threat. Unless a large fraction of humanity does some fairly specific actions of unwrapping, cooking and eating food, humanity goes extinct. But this isn't on your list of x-risks, because this is an example where the feedback loop is tight.
make survival-oriented behavior a winning strategy in the competitive landscape.
Again, you seem to be assuming a world of perfect competition and 0 foresight and planning. Also, the groups that prepare against pandemics get less pandemics.
Think of how organisms evolved immune systems: pathogens are frequent, individual infections are survivable, and organisms with better defenses reliably outcompete those without them. The feedback loop is tight, the disaster distribution is right, and survival-competence gets selected for.
But notice how specific the required conditions are:
So long as there is at least one source of disasters like that, this selects against anything too myopically competitive.
You have one path through time, and extinction is an absorbing state: once you enter it, you do not leave.
Granted. A high competence singleton is also somewhat of an absorbing state.
And a civilization that has maxed out the tech tree will probably have a low ongoing risk of extinction.
A world with smarter humans is not a world where smart survival-oriented humans dominate, but a world where smart survival-oriented humans compete against equally smart growth-oriented, power-oriented, and profit-oriented humans, and lose, for exactly the same structural reasons they lose now, just at a higher cognitive level.
I don't think that's true. If x-risk reduction gets a constant fraction of resources, a richer civilization has more resources to throw at the problem.
You are taking a situation where 2 utility functions are mostly uncorrelated, and using "resources" to claim that the game is 0 sum. Uncorrelated != 0 sum. 2 agents with uncorrelated utility functions might find a way to achieve near maximum on both functions.
Generally you keep assuming near perfect competition, but also everyone has an end-the-universe button. This is quite an odd thing to assume. A world of perfect unrestricted military competition is one where various sides routinely throw nukes and bioweapons at each other. In this world, everyone has nuclear bunkers and bioweapon defenses.
It's possible there is something that can destroy the whole world, but not be targeted to only destroy your enemies, but that's a rather specific kind of thing.
It is quite possible that the techs necessary to come up with a solution to deal with the threat could end up having substantial spinoff applications, thus paying for itself quite comfortably.
Otherwise the society in question would have to be, among other things, a post-scarcity one where capitalism has been transcended and money no longer exists.
This is a good write up of an interesting, if pessimistic argument. I'm not sold that this happens on a timescale that falls within ordinary human planning timelines of a century or two, but I'm not totally convinced that it doesn't, either.
I've actually seen a somewhat different argument about the dangers of optimization. This was made by Vernor Vinge in his fantastic novel, A Deepness in the Sky.
The central idea was that optimization gave you more resources to use, but that sufficient optimization also destroyed "slack", your margin for dealing with emergencies. For example, a highly optimized "Just in Time" manufacturing system is more profitable than idle warehouses full of inventory. But it things went wrong, you had very little buffer to draw upon. And that over generational time, if nothing else killed you, there was a temptation to optimize right up to the limits of your environment. This would mean that even small environmental shifts might cause cascading failures.
I'm not if this poses a true risk of civilizational collapse and large scale disaster. But it has made me appreciate the idea of slack and redundancy in systems. I recall that Netflix, for example used to run across 3 AWS regions, but it only needed 2 of them. Which meant they could lose a region and keep operating.
And this is certainly a risk that ops people know: Running too close to 100% capacity for too long means that failures tend to cascade rapidly and dramatically.
It might be interesting to spend some time trying to construct systems that wouldn't kill themselves because of the dynamics you describe, or that would kill themselves only extremely slowly (maybe it's good to think of this in terms of how much stuff you can get done or how much tech development you can do without killing yourself, so that we don't consider just slowing down the pace of everything uniformly a win). [1] In particular, I think it is interesting to consider configurations of the following form:
My guess is that we can identify an initial configuration in this space such that the system probably doesn't kill itself for a long time (like let's say for at least a thousand years' worth of getting stuff done or technological development at the 2025 pace [2] ).
Also see Yudkowsky's world.
like for now i mean: "construct" conceptually, like "construct" in the mathematician's sense, not in practice. though constructing such a system in practice is of course also very interesting and important ↩︎
note that this is a decent amount of development/[doing stuff] despite corresponding to only 1000 years — plausibly more than the sum total of development/[doing stuff] in our galaxy's history so far, given how much things have sped up ↩︎
This is a great reframing. The concept of humanity likely being "minimally fit" for its niche is one I need to reflect more on.
"A civilization needs to become smart enough to internalize the instrumental value of survival before it becomes powerful enough to alter its own local environment to a lethal state" - this reminds me of Nick Land's deterritorilisation argument.
Every dollar and every unit of political capital spent on Cthulhu-proofing is a dollar and a unit of political capital not spent on things that yield competitive advantage right now. Every leader who commits their country's resources to the Cthulhu project is outcompeted by a leader who instead commits those resources to economic growth, military strength, or popular welfare programs. Every researcher who works on Cthulhu defense could be working on something that produces more papers, grants, or products in their lifetime. The people who take Cthulhu seriously are, at every level of the competition, at a disadvantage relative to those who don't, or who mouth the right words about taking it seriously while allocating resources elsewhere.
won't a society that reasons this way get "outcompeted" by one that makes better decisions, in the sense that the former society ends up eaten by fish people (or whatever the fate)?
as long as there is an era where the threats are local, selection should have enough feedback to teach the lesson.
My two counterarguments go like this:
You don't really talk about nested structures of cooperation and competition incentivizing structures to be longer lived and avoid going extinct. Like within a tribe, people could be fighting for their own short term gains over the survival of the whole, but tribes where people do that go extinct. Same with species. And nations probably. So its a problem we've faced before, and there is a strong incentive to avoid the extinction thing.
You kind of talk about it here:
> "Now, there is one scenario in which survival-oriented behavior could be reinforced: if existential disasters happen frequently enough and are mild enough that agents who prepare for them consistently outperform agents who don't."
But this makes it seem like this is a rare and unusual thing. But it seems to me like like an omnipresent force. Bodies are destroyed, get sick. Nations die out. Species go extinct etc.
I think this dynamic is true at different scales, not just humanity's overall civilization.
The fundamental problem is that everyone's locked in a prisoner's dilemma with Darwinian evolution tacked on top so that those who win one round get to duplicate and gain an advantage in the next round, so that everyone has to constantly defect to gain power. (This applies even to actors who want to optimize for cooperation in the world - their best strategy is ruthlessly gaining power first to gain the ability to use coercive strategies that force other people to cooperate! Note: in the real world these actors may have a way to avoid "defecting", or doing non-cooperative actions, if they can creatively find a way around most competiton and thus avoid the prisoner's dilemma entirely)
As a result you can see examples throughout history of organizations and societies failing to "stop Cthulhu" (as defined by this article) in various different ways. You can replace Cthulhu with climate change, Hitler, or the long-term innovation of companies that are required to stay relevant in a changing market 20 years down the line. People never start cooperating until they realize it's almost too late to stop the issue, and thus that banding together and focusing efforts to solve the issue becomes the only possible good strategy for all actors. And sometimes they realize too late and the society or company never recovers. (On a global civilizational level, this kind of last-minute recovery gets harder and harder as technology improves. How WW2 played out would be almost certainly impossible to replicate if something like it were to happen again. But perhaps that would just push the "last-minute" threshold earlier instead so the world keeps being saved just before the problem gets out of control.)
The only way around these dynamics is for a leader or regulator to punish defectors and force everyone to cooperate towards solving the long-term problem, who is themselves benevolent (wants to solve said long-term problem) and beyond the ability of anyone else to compete against. Examples: visionary founders, shareholder-proof leaders of PBCs, national governments (relative to companies), multinational organizations like the EU and UN (to the extent they actually do something), and the US-led post-war world order more generally.
Otherwise the short-term optimizers will always bubble to the top and doom the society or organization as a whole.
The author says as much:
This is, in effect, asking for a global coordination mechanism that persists indefinitely against strong incentives to defect.
At corporate-sized organizational scales, there are obviously feasible solutions to the problem. In the civilization case, it is almost impossible without domination by a single leader, whether human or AI.
(I would say that this line of thinking, if extrapolated to a societal and global level, leads to some very troubling implications on what the best path of future civilization looks like.)
A somewhat orthogonal hypothesis that I was thinking about for some time: if we develop a rigorous definition of intelligence (sounds plausible), it may be possible to prove mathematically that it is unstable. Or maybe to prove that it is stable. And I don't mean just humans going extinct, but any possible intelligence, including ASI.
In other words, even in a maximally friendly universe p(doom) is exactly 1 (or 0, for the opposite result).
The trick is, of course, to create the math necessary to describe the system, beginning with definitions.
The epistemic status thing
Please read this section because it is not a disclaimer for the sake of it.
I think the model I describe in this post can easily be wrong, in its full form. However, I also think the probability of it being substantially correct is non-negligible and even more than that, and that this alone warrants writing it up and thinking about it seriously.
Throughout this post, I will write "X is the case" rather than "it seems plausible that X might be the case, though I am not certain." Every claim should be mentally prefixed with the appropriate hedge. I'm dropping the hedges not because I'm confident, but because a text where every sentence contains "it seems plausible that" becomes unreadable without actually making anyone more calibrated.
I should also flag that I'm writing this from a state of considerable frustration with the current state of affairs, and I this frustration may be load-bearing in places where it shouldn't be.
This is not about AI
AI risk is, at most, one illustration of the phenomenon I'm describing here, and not a particularly privileged one. Everything in this post would hold in a world where artificial intelligence had never been conceived of. It would hold in a world where nuclear weapons were never built or where the climate was perfectly stable and no one had ever heard of greenhouse gases.
The claim is about a structural property of civilizations under optimization pressure, not about any particular technology or threat vector. If you finish reading this and your takeaway is "oh, another AI doom argument," then I have failed to communicate the central point, which is that the problem runs much deeper than any specific risk and would persist even if every currently-known risk were magically eliminated tomorrow.
The central claim: extinctive pressure
The core idea: extinction is the default outcome for any civilization, and it is a strong default. Not because of any particular threat, but because of a structural property which I will call extinctive pressure, which operates on every civilization simply by virtue of how optimization, competition, and the universe work together.
I want to unpack what I mean by calling it a "pressure" rather than just "a risk" or "a likely outcome".
Consider a version of an old thought experiment. Suppose you learn that Cthulhu is real. He is sleeping somewhere in the Pacific, and in approximately 200 years he will wake up and destroy all of humanity, unless humanity coordinates to prevent it and spends significant but realistically achievable resources on it. You have excellent evidence for this. The evidence is publicly available and widely accepted as credible.
I claim that, almost certainly, humanity does not successfully fight Cthulhu.
Why? Roughly, because fighting Cthulhu is costly (in whatever sense), and the cost is immediate while the payoff is distant.[1] Every dollar and every unit of political capital spent on Cthulhu-proofing is a dollar and a unit of political capital not spent on things that yield competitive advantage right now. Every leader who commits their country's resources to the Cthulhu project is outcompeted by a leader who instead commits those resources to economic growth, military strength, or popular welfare programs. Every researcher who works on Cthulhu defense could be working on something that produces more papers, grants, or products in their lifetime. The people who take Cthulhu seriously are, at every level of the competition, at a disadvantage relative to those who don't, or who mouth the right words about taking it seriously while allocating resources elsewhere.
This is what I mean by "pressure". I don't mean "extinction is likely" as a passive observation, the way one might say "rain is likely tomorrow." I mean there is an active force that pushes civilizations toward extinction and pushes back against attempts to resist. It is a force in the relevant sense that: (1) counteracting it requires continuous expenditure of energy and resources, (2) in the absence of such expenditure, the system drifts toward extinction by default, and (3) the agents who do spend energy fighting it are systematically outcompeted by those who do not, which means the system's ability to fight it degrades over time even if some agents start out fighting it.
The analogy I find useful is sailing against the wind. It is not impossible and it does not violate any law of physics. Sailboats can and do sail against the wind. But it is costly, it is slower than the opposite, and it requires continuous active effort and skill, and the moment you stop actively doing it, you drift back. And critically, in a race, the boat sailing with the wind will generally beat the boat sailing against it.
Now let me explain the entire chain of arguments.
Optimization does not target survival
Agents and civilizations are subject to optimization. Most of these optimization processes don't have survival-of-the-civilization as their objective. They have local objectives: reproductive fitness, profit, political power, memetic spread. These objectives are sometimes loosely correlated with civilizational survival, in the same way that a company's profitability is sometimes loosely correlated with its customers' wellbeing. But "loosely correlated" is doing an enormous amount of work in that sentence, and for our purposes, the correlation is almost certainly insufficient, and for two reasons:
The worst of the best possible worlds
Here I need to introduce a concept I'll call the local environment.
You exist, right now, in an extraordinarily specific set of conditions. The fundamental physical constants of this universe permit complex chemistry, the Sun is a stable main-sequence star in a relatively quiet region of the galaxy, Earth has a magnetic field that deflects solar wind, plate tectonics that regulate the carbon cycle, a large moon that stabilizes axial tilt, oxygen-nitrogen atmosphere held at a temperature range compatible with liquid water, and so on.
All of this is what I mean by "local environment": the specific bubble of conditions, bounded in space, time, and parameter space, within which our continued existence happens to be viable.
Now apply a combination of two principles. The anthropic principle tells us that we necessarily find ourselves in conditions compatible with our existence, so we should not be surprised that our local environment is friendly. The Copernican principle tells us that we are not in a special or privileged location in the space of possibilities; we should expect to be typical among observers.
Combining these yields a conclusion: we should expect to live in a minimally friendly universe, not a maximally or even an average-friendly one. The anthropic principle guarantees that our universe clears the bar for producing observers. The Copernican principle says we are typical among observer-containing universes. Since there are vastly more ways for a universe to barely clear the bar than to be friendly for humans on all levels and in all parts of the configuration space, typical means "barely clearing the bar." We should expect our universe to be friendly enough to produce us and not much more than that.
This can be illustrated with the evidence on the life-permitting regions in the space of fundamental physical constants (but this is an illustration, I do not want to create impression that the argument is only about fundamental physics parameters fine-tuning). Barnes (2012) provides the most comprehensive review: in the space of possible physical laws, parameters and initial conditions, the set that permits the evolution of intelligent life is very small. The key point that matters for the argument here is the geometric fact that makes the "minimally friendly" conclusion follow from the Copernican principle: in a multi-dimensional parameter space where the viable region is a tiny sliver, most of the volume of that sliver is near its boundaries, not deep in its interior. Typical observers are near the edge of viability, not comfortably in the center.
In other words, the conditions for civilizational survival are not the default state of reality, they are a razor-thin exception, and we happen to be in the exception right now because that is the only place observers can be. Step outside the local environment, in any direction (deeper into space, further into the future, or into novel physical/technological/social configurations that our environment has not been tested against) and you should by default expect conditions to be lethal.[2]
Why, then, have we survived until now? Probably, because our local environment has been stable, at least on the timescale relevant to biological evolution and civilization development, and because of anthropic selection.
Local environments are not stable, and we are actively destabilizing ours
Even on the natural side, if we wait long enough, then environment will change to lethal: the Sun's luminosity will increase, asteroid impacts are stochastic, and then comes heat death.
But the more immediate concern is that powerful agents actively change their local environment, often dramatically and often in ways that are poorly understood at the time.
The entropy frame
Making everything go right is hard. Making something go wrong is easy. This is also, at root, an observation about the relative sizes of state spaces: the states in which a complex system continues to function are vastly outnumbered by the states in which it doesn't, in the same way that the configurations of a watch that tell time are vastly outnumbered by the configurations that don't.
The important corollary is that things go right only when there is a specific, powerful mechanism making them go right. Your body maintains homeostasis not by default but because billions of years of evolution built elaborate regulatory systems to keep temperature, pH, blood oxygen, and a thousand other parameters within viable ranges.
Where there is no such mechanism, entropy wins. The system drifts toward the vastly larger space of non-functional states. The question for civilizational survival is therefore: is there a specific, powerful mechanism that keeps civilization within the narrow band of survival-compatible states? Most likely, no.
Feedback loops for survival are not tight
A reader might object to the entropy argument as follows: "Sure, survival states are rare, but we have feedback mechanisms. When things start going wrong, we feel pain, we experience resource shortages, we notice and we correct." This is true in many domains, and it is exactly why many complex systems remain functional despite entropy: they have tight feedback loops that detect deviation from the functional state and apply corrective pressure.
The problem is that for existential threats, the feedback loops are, generally, not tight at all.
Consider what a tight feedback loop for survival would look like. For every existential threat, there would need to be some proportionate preliminary signal: pain, resource loss, political instability, something that registers in the preferences and incentive structures of the agents who could actually do something about it (so not only in some "empirical signal"). The signal would need to be (a) early enough to allow corrective action, (b) strong enough to motivate costly corrective action, and (c) reliably connected to the actual severity of the threat rather than to some proxy that can be Goodharted. Which is unrealistic to expect.
Survival conflicts with optimization
The states that optimization actually pushes us toward are not merely uncorrelated with survival, they are actively in tension with it.
At every level of competition, agents face resource allocation decisions. Some resources can be directed toward survival-relevant activities. The same resources can alternatively be directed toward activities that improve the agent's competitive fitness right now.[3]
There is always a margin at which you can reallocate from the first category to the second. And crucially, there is no strong feedback loop running in the reverse direction: investing in civilizational survival does not, in general, make you more competitive. Sometimes there is a weak correlation (a society that prepares for pandemics may have a healthier workforce), but the correlation is nowhere near tight enough to make survival-oriented behavior a winning strategy in the competitive landscape.
This means survival-oriented behavior has negative fitness in most competitive environments in the long run, because agents who skip it and reallocate those resources to direct competition will, all else being equal, outperform agents who don't.
Now, there is one scenario in which survival-oriented behavior could be reinforced: if existential disasters happen frequently enough and are mild enough that agents who prepare for them consistently outperform agents who don't. In this scenario, you get something like natural selection for survival-competence. Think of how organisms evolved immune systems: pathogens are frequent, individual infections are survivable, and organisms with better defenses reliably outcompete those without them. The feedback loop is tight, the disaster distribution is right, and survival-competence gets selected for.
But notice how specific the required conditions are:
(a) The disaster distribution must be very particular. The existential threats must be frequent enough to provide a training signal, but mild enough that failing to prepare is costly without being instantly terminal. If disasters are too rare, the agents who prepare for them waste resources that their competitors spend on winning the current round. If disasters are too severe, they simply wipe out everyone and there is no differential selection at all. The sweet spot, frequent mild existential threats that reward preparation, is a very narrow band, and there is no reason to expect our actual threat distribution to fall in it. In fact, the threats we actually face are characterized precisely by being rare and catastrophic rather than frequent and mild.
(b) Even if the distribution were right, calibration to local threats does not generalize. Suppose a civilization does develop good responses to the existential threats in its local environment. This gives you no guarantee whatsoever that those responses transfer to novel threats outside the training distribution. The "training distribution" of past existential near-misses is not representative of the full distribution of possible existential threats.
The ability to foresee landscape changes does not reliably help
Suppose some agents are smart enough and can predict how the competitive landscape will shift. Can they use this foresight to prepare? In principle, yes. In practice, they are competing against agents who use their resources to win in the current landscape rather than preparing for the next one. An agent who correctly predicts that AI will reshape the competitive landscape in 10 years and diverts resources to prepare for that transition is, for the next 9 years, outcompeted by agents who use those resources to dominate in the present. By the time the foresighted agent's predictions are vindicated, the shortsighted agents may have already accumulated enough power and resources to be the ones who determine how the transition goes. This is, incidentally, a recognizable dynamic in financial markets, where being right too early is operationally indistinguishable from being wrong.
Requirements for survival under optimization
We can now state fairly precisely what the optimization landscape would need to look like for a civilization to survive under optimization pressure:
Requirement 1: The current fitness level must be sufficient. Either the civilization's present position in the competitive landscape must already be within the survival-compatible region, or the optimization dynamics must be carrying it toward the survival-compatible region fast enough that it arrives before an existential catastrophe occurs.
Requirement 2: The landscape must be sufficiently static. The competitive environment must not change so fast that adaptations become obsolete before they can accumulate. If the landscape shifts faster than the civilization can adjust, then even a civilization that is currently well-adapted is on borrowed time, because its adaptations are being invalidated faster than new ones can be developed.
So: how likely is it that both requirements are satisfied simultaneously?
The conjunction of these two requirements is not logically impossible. But it requires a very specific, very lucky configuration of the optimization landscape, and there is no known mechanism that selects for or maintains such a configuration.
The civilization path is not ergodic
You do not get to play multiple times. There is no ensemble of civilizations over which your results average out. You have one path through time, and extinction is an absorbing state: once you enter it, you do not leave.
Any per-period probability of extinction that is not infinitesimally small converges to certainty over a long enough time horizon. If you face a 1% probability of extinction per century, your probability of surviving 1,000 years is about 0.99^10 ≈ 0.90. Survivable, maybe. Your probability of surviving 10,000 years is about 0.99^100 ≈ 0.37. Uncomfortable. Your probability of surviving 100,000 years is about 0.99^1000 ≈ 0.00004. Effectively zero. And 100,000 years is nothing on a cosmological timescale.
To survive in the long run, you do not need to survive "on average" or "in expectation." You need to survive almost surely, almost in the technical measure-theoretic sense: the probability of the extinction event must be driven so close to zero, in every period, that the infinite product of survival probabilities converges to something positive. This is an astronomically more demanding requirement than "keep the per-period risk reasonably low".
Survival does not scale
A corollary: survival competence at one scale does not compose into survival at larger scales, neither in time nor in space.
Consider the temporal dimension. Suppose a society manages, through heroic effort, to maintain a serious focus on existential risk mitigation for the duration of one generation, roughly 30 to 50 years. Suppose every generation is like this: each inherits the commitment, maintains the institutions, and keeps the probability of extinction very low for its own period. Does this chain of responsible generations add up to long-term survival?
Not necessarily, and probably not. The probabilities which look totally ok for a generation lifetime will multiply and imply extinction in the long run.
The spatial dimension is, if anything, worse. Suppose one country manages to implement strong pro-survival policies. This country's survival still depends on what every other country does. Survival under existential risk is a weakest-link problem, more precisely a weakest-link problem across the entire globe and across all of future time, and the probability of every link holding in every period is the product of all the individual holding probabilities.
The process is fat-tailed
The non-ergodicity argument above assumed, generously, that the per-period extinction probability is roughly constant over time. The actual situation is almost certainly worse.
The stochastic process representing civilizational outcomes is very likely fat-tailed. That is, the distribution of possible shocks to the system has tails that are much heavier than a Gaussian or other thin-tailed distribution would suggest. Extreme events, especially technological ones, are not exponentially rare the way they would be under a thin-tailed model. They are polynomially rare at best, which means they are much more likely to occur on civilizational timescales than naive extrapolation from recent calm periods would suggest.
Under fat tails, the non-ergodicity problem gets dramatically worse. In a thin-tailed world, you can at least estimate the per-period risk reasonably well and plan around it. In a fat-tailed world, the per-period risk is dominated by events that are outside your current model, events that you cannot estimate because you have never observed anything like them and your historical data gives you almost no information about their probability. The "1% per century" estimate I used above was illustrative; in a fat-tailed world, the honest estimate is something closer to "we don't know and we may be unable to know, because the tail events that dominate the risk have not happened yet and will look nothing like anything that has."
However, there is even more to it.
In the absence of sufficiently strong corrective mechanisms, the multiplicative fat-tailed process drives the system to zero
Consider a multiplicative process: a system whose state at each time step is multiplied by some random factor (rather than having some random amount added). Civilizational "health" is plausibly multiplicative in this sense: a sufficiently bad shock doesn't subtract a fixed amount from your prospects, it multiplies them by something close to zero. And extinction is the absorbing state at zero.
For the simplest case, geometric Brownian motion (a multiplicative process with Gaussian log-returns of mean μ and variance σ²), there is an exact and famous result: the time-average growth rate of a single path is not μ but μ − ½σ². This is the gap between the ensemble average and the time average, between what happens "on average across many parallel worlds" and what happens "in the one world you actually live in." The correction term ½σ² is always negative, which means the single-path growth rate is always lower than the expected growth rate, and when σ² is large enough, it goes negative even if μ is positive. Your expected value grows, but you, on your one path, go to zero.
This is already an important result. It tells you that high volatility is not just unpleasant but existentially dangerous for any system that cannot restart after hitting zero: a single path through a sufficiently volatile multiplicative process will almost surely be destroyed even if the average outcome across all possible paths looks fine.
Now consider what happens when the process is fat-tailed. Replace the Gaussian shocks with shocks drawn from an α-stable distribution with stability index α < 2. For α < 2, the theoretical variance is infinite (or undefined). The exact formula μ − ½σ² is derived for the Gaussian case specifically (via Itô's lemma), and does not formally carry over to Lévy processes with infinite variance; the correction terms take a different and more complex form involving the characteristic exponent of the process. However, there is an empirical regularity: if you compute the running sample variance up to each time point (which is always finite, being computed from finitely many observations) and plug it into the Gaussian formula μ − ½(σ_empirical)², you get a quantity that tracks the actual realized growth rate of the process quite well. And what you observe is that the running sample variance grows over time, punctuated by ever-larger spikes as new extreme observations arrive, because the sample variance of an infinite-variance distribution does not stabilize. Each new tail event revises the effective σ² upward, and the effective growth rate μ − ½(σ_empirical)² is dragged further and further down.[4]
The implication, stated loosely: for a multiplicative process with fat-tailed shocks, the effective volatility penalty grows without bound over time, eventually overwhelming any positive drift. The single path goes to zero not just with high probability, but with a kind of inevitability, as the accumulating tail events ratchet the effective growth rate ever more deeply negative.
Now, this argument, taken at face value, seems to prove too much. If fat-tailed multiplicative dynamics inevitably drive systems to zero and if they really describe reality precisely enough, then every organism, every species, every civilization is doomed regardless of anything it does. Which is probably too strong argument even for this speculative post.
The resolution, I think, is that real systems that persist because they are not undergoing unconstrained multiplicative random walks - either their walks are not really multiplicative or not really fat-tailed.
So the mathematical argument does not prove that everything dies no matter what. What it shows is something like that: in the absence of sufficiently strong corrective mechanisms, the multiplicative fat-tailed dynamics dominate and the system goes to zero. Survival requires a corrective mechanism strong enough to counteract the ever-growing volatility penalty that fat-tailed shocks impose on single-path dynamics.
Instrumental convergence: the strongest counterargument, and why it probably fails in practice
Ok, but what about instrumental convergence?
We surely now: almost any goal you might have requires you to be alive to pursue it. A sufficiently intelligent agent, regardless of its terminal goals, should converge on self-preservation (and by extension, civilization-preservation, if the agent depends on civilization) as an instrumental subgoal.
I think this argument is correct in principle, and if it works in practice, it defeats the lethal reality hypothesis. If civilizations reliably produce agents (whether individual humans, institutions, or AI systems) that are smart enough to internalize the instrumental value of long-term survival and eventually act on that understanding effectively, then there is a mechanism that counteracts extinctive pressure, and the central thesis of this post is wrong.
However, the key word is "eventually". The question is not whether this realization eventually occurs but whether it occurs soon enough, with sufficient force, in enough agents, to actually counteract the extinctive pressure before it is too late.
More specifically:
"Sufficiently smart" is a high bar, and we have not cleared it. Instrumental convergence is a theoretical property of sufficiently intelligent agents. Humans are intelligent, but apparently not sufficiently intelligent in the relevant sense. Humans understand, in the abstract, that they need to be alive to pursue their goals, but there are many cases when humans do not act accordingly, including on civilization level.
Theoretical acceptance is not behavioral compliance. Even among agents who understand and accept the instrumental convergence argument, there is a further gap between intellectual acceptance and actual behavior. Understanding that survival is instrumentally necessary does not automatically reconfigure your incentive structure, your discount rate, your competitive environment, or the institutions you operate within. This is, in a sense, just the core mechanism of extinctive pressure restated: even agents who understand the argument are outcompeted by agents who understand it equally well but choose to defect.
The race between understanding and environmental shift. A civilization needs to become smart enough to internalize the instrumental value of survival before it becomes powerful enough to alter its own local environment to a lethal state.
Competitive exclusion of those who understand. Suppose some agents in the civilization do clear the bar. These agents are now competing against equally intelligent agents who also understand the argument but who, for whatever reason, do not act on it. The agents who don't act on it have more resources available for immediate competition, because they are not paying the survival tax. The agents who understood and acted are marginalized, not because they were wrong, but because being right was expensive and being wrong was free.
So it is not enough to be smart enough eventually. A civilization must be smart enough here and now, which is hard because:
Every emerging civilization is around the dumbest possible civilization
A civilization arises when a species crosses some threshold of cognitive and social capability. Below this threshold, you don't get civilization at all. Above it, you do. The question is: how far above the threshold should we expect a newly-emerged civilization to be?
The answer, by a straightforward selection argument, is: barely above it. Civilizations emerge as soon as they can, because the components that give rise to civilization (intelligence, social complexity, tool use) are under positive selection pressure for reasons that have nothing to do with building civilizations per se. They are selected because they confer immediate competitive advantage. A species crosses the civilization threshold not because it was aiming for civilization but because the optimization pressures that were pushing intelligence upward for other reasons happened to push it past the threshold. And since it crosses as soon as it can, it crosses with approximately the minimum cognitive endowment required.
This is directly analogous to a point that is familiar in evolutionary biology: organisms tend to be minimally adapted to their niches, not maximally, because evolution is a satisficing process that stops optimizing a trait once it is "good enough" for the current competitive landscape (or more precisely, once the marginal fitness return of further improvement drops below the marginal cost). Civilizations, similarly, emerge at approximately the minimum viable intelligence, because there is no selection pressure that specifically pushes a pre-civilizational species past the minimum, and even if there were, it would require more time than civilization-scale time.[5]
So, we should expect a newly-emerged civilization to be roughly as dumb as a civilization can possibly be while still counting as a civilization.
Intelligence is not the bottleneck
It is true that, all else being equal, greater intelligence leads to better awareness of existential risks and perhaps even greater concern about them.
But "all else being equal" is never the actual situation. What actually happens when the general intelligence level of a civilization increases is that all agents get smarter, including the ones who are optimizing for things other than survival. And the ones optimizing for things other than survival still have a competitive advantage, because they are not paying the survival tax. A world with smarter humans is not a world where smart survival-oriented humans dominate, but a world where smart survival-oriented humans compete against equally smart growth-oriented, power-oriented, and profit-oriented humans, and lose, for exactly the same structural reasons they lose now, just at a higher cognitive level.
The state of affairs where you have high intelligence combined with thorough disregard for existential risk might seem like an unlikely or unnatural configuration. And in a sense it is: if you picked a truly random mind from the space of all possible minds with that intelligence level, it might be unlikely to have this particular pattern of concerns and blind spots. But we are not sampling randomly from mind-space. We are observing the output of an optimization process that specifically selects for competitive fitness, and competitive fitness is improved by intelligence while being unimproved or actively harmed by existential risk concern. The optimization process is, in effect, searching for exactly this "unlikely" configuration.
Caring about survival is not the same as surviving
I do not dispute that many people care about survival, on the psychological level.
However, there are three levels, and only the third matters:
Level 1: Caring about survival in the psychological sense. You feel concerned. You believe existential risk is real and important. You experience anxiety or urgency about it.
Level 2: Executing actions that are intended to be pro-survival. You donate to AI safety organizations. You write papers about existential risk. You lobby for policy changes. You work at an alignment lab. You feel, reasonably, that you are doing something about the problem.
Level 3: Executing actions that actually lead to survival. You are executing actions that causally contribute to the civilization actually not going extinct. This is an absurdly demanding criterion, and that is precisely the point.
The extinctive pressure is a claim about Level 3. It says that the optimization dynamics of civilizations make it extremely unlikely that agents execute Level 3 actions at sufficient scale. It doesn't imply that agents don't execute actions on Levels 1 and 2.
The progress dilemma: there are no good options
Option 1: Full speed ahead. This is approximately the status quo. Technological progress continues at whatever rate the competitive optimization landscape produces, which in practice means as fast as possible, because agents who develop technology faster outcompete those who develop it slower. The problem is obvious and already discussed at length.
Option 2: Halt progress. Stop developing new technology entirely. This avoids the problem of self-inflicted environmental destabilization, but it runs directly into the other horn of the dilemma: the natural environment is not static. On cosmic timescales, it implies certain extinction. Also, beware that competitive dynamics pushes towards resuming progress.
Option 3: Slow, cautious, survival-directed progress. This is the option that seems like it should work, and the reasons it doesn't are the most instructive. The idea is: progress slowly and carefully, directing research and development specifically toward technologies that enhance civilizational resilience, while carefully managing the risks introduced by each new technology before proceeding to the next one.
The problem is that "slow, cautious, survival-directed progress" is not a natural attractor of any known optimization process. It is simply slow progress. A world that progresses slowly is not, by default, a world that directs its slow progress toward survival. It is a world where the same competitive dynamics operate at a slower rate. The agents who would need to redirect progress toward survival still face the same structural disadvantages.
For this to work, you would need a mechanism that not only slows progress but redirects it, that specifically channels civilizational resources toward survival-relevant capabilities rather than competitive-advantage-relevant capabilities. And this mechanism would need to be self-sustaining: it would need to maintain itself against the continuous pressure of competitive dynamics that reward defection from the survival-oriented program. This is, in effect, asking for a global coordination mechanism that persists indefinitely against strong incentives to defect.
Indirect supporting arguments
There is a variety of classical doomsday arguments in the philosophical literature. I do not build on them, nor they are required for the current line of reasoning. But the consistency between them is worth noting.
The most relevant is the Doomsday Argument proper, which in its simplest form observes: if humanity will eventually number trillions of people spread across millennia or galaxies, then your birth rank (roughly the 100-billionth human ever born) places you extraordinarily early in the species' history, which is a priori unlikely. You would be in the first 0.001% or less of all humans who will ever live. Under a uniform prior over birth rank, this is much less likely than the alternative: that the total number of humans who will ever live is not astronomically large, which implies that humanity does not have a long or expansive future. The argument is, to a degree, controversial and there are well-known objections (the Self-Sampling Assumption vs. Self-Indication Assumption debate, the reference class problem), and I do not want to relitigate them here. The point is simply that if you take the Doomsday Argument seriously, it is telling you roughly the same thing as extinctive pressure: the far future with billions of years of flourishing human civilization is not the expected outcome.
The Great Filter argument is perhaps even more directly relevant. For all I know, it may very well be the case that the Great Filter is the case, and the growing biological evidence, in my view, shifts the estimates towards the future from the past. At the same time, I acknowledge that ASI can't really be the Great Filter, because in that case the Universe around us would have been already conquered by some ASI. So there is a real tension here, in my view.[6]
In any case, the extinctive pressure hypothesis offers a specific mechanism for a filter that is ahead.
The observation selection effects literature, generally, supplies the tools for reasoning carefully about what we should expect to observe given that we are observers, and many of the conclusions in that literature point in the same direction. We should not be surprised that we find ourselves in a universe that permits observers, but we should also not take the fact that we are here as strong evidence that the future is likely safe.
And of course, there is a lot written on technogenic risks. Every major technology is, from the extinctive pressure perspective, a step outside the local environment into an untested region of parameter space. On the object level, we can trivially think about things like, for example, ASI, nuclear war, global warming, bioweapons, or grey goo.
Summary: optimization-wise, you are punished for caring about survival
Agents who divert resources from competition toward survival pay an immediate cost and receive no commensurate competitive benefit, which means they are systematically outperformed by agents who do not pay this cost.
This is what I mean by calling it a "pressure". When a system tries to fight extinction, there is a restoring force that pushes the system back toward the state of not fighting extinction, unless large resources are applied in the opposite direction.
The sailors who turn and run with the wind will always move faster, unless additional stronger source of power is applied by their competitors.
Which is not say you shouldn't care about survival.
This reads like an argument about preferences/utility functions over time, but it is not, let me be clear on that. I make this note because that sentence may create a wrong impression that the problem is in time discounting, which is not, because see what is written next.
This is very much in line with Fragile World Hypothesis. See it for more discussions on this topic.
This line of reasoning, recurring throught the entire post, relates a lot to the Inadequate Equilibria. I think it is correct to formulate what I write here in terms of the inadequate equilibra framework. It is just that "adequate equilibria" are very rare and are optimized against.
This entire thing probably sounds too technical and not what I, as the author, should expect to be smoothly understood by the audience. For clarity, I ask to either ask AIs about alpha stable processes, or take my book on that, or ask AIs to summarize the results from my book on empirics of alpha stable processes.
Strictly speaking, there are plausible scenarios where the same pressures that produced civilization-level intelligence continue operating after the threshold is crossed (sexual selection for intelligence, for instance, doesn't stop just because you've invented agriculture). The point, however, still holds in a weaker form: we should expect to be near the minimum (especially in the short timescales), not necessarily at it, and "near the minimum" is sufficient for everything that follows.
Obviously, I think ASI is powerful enough to escape extinctive pressure. So the absense of ASI in this region of the Universe looks like an argument for the Great Filter being behind, but I am not sure. Anyway, it is a separate big topic.