Nick Bostrom has put up a new working paper to his personal site (for the first time in two years?), called The Vulnerable World Hypothesis.

I don't think I have time to read it all, but I'd be interested to see people comment with some choice quotes from the paper, and also read people's opinions on the ideas within it.

To get the basics, below I've written down the headings into a table of contents, copied in a few definitions I found when skimming, and also copied over the conclusion (which seemed to me more readable and useful than the abstract).


  • Is there a black ball in the urn of possible inventions?
  • A thought experiment: easy nukes
  • The vulnerable world hypothesis
    • VWH: If technological development continues then a set of capabilities will at some point be attained that make the devastation of civilization extremely likely, unless civilization sufficiently exits the semi-anarchic default condition.
  • Typology of vulnerabilities
    • Type-1 (“easy nukes”)
      • Type-1 vulnerability: There is some technology which is so destructive and so easy to use that, given the semi-anarchic default condition, the actions of actors in the apocalyptic residual make civilizational devastation extremely likely.
    • Type-2a (“safe first strike”)
      • Type-2a vulnerability: There is some level of technology at which powerful actors have the ability to produce civilization-devastating harms and, in the semi-anarchic default condition, face incentives to use that ability.
    • Type-2b (“worse global warming”)
      • Type-2b vulnerability: There is some level of technology at which, in the semi-anarchic default condition, a great many actors face incentives to take some slightly damaging action such that the combined effect of those actions is civilizational devastation.
    • Type-0 (“surprising strangelets”)
      • Type-0 vulnerability: There is some technology that carries a hidden risk such that the default outcome when it is discovered is inadvertent civilizational devastation. 47
  • Achieving stabilization
    • Technological relinquishment
      • Principle of Differential Technological Development. Retard the development of dangerous and harmful technologies, especially ones that raise the level of existential risk; and accelerate the development of beneficial technologies, especially those that reduce the existential risks posed by nature or by other technologies.
    • Preference modification
    • Some specific countermeasures and their limitations
    • Governance gaps
  • Preventive policing
  • Global governance
  • Discussion
  • Conclusion


This paper has introduced a perspective from which we can more easily see how civilization is vulnerable to certain types of possible outcomes of our technological creativity—our drawing a metaphorical black ball from the urn of inventions, which we have the power to extract but not to put back in. We developed a typology of such potential vulnerabilities, and showed how some of them result from destruction becoming too easy, others from pernicious changes in the incentives facing a few powerful state actors or a large number of weak actors.
We also examined a variety of possible responses and their limitations. We traced the root cause of our civilizational exposure to two structural properties of the contemporary world order: on the one hand, the lack of preventive policing capacity to block, with extremely high reliability, individuals or small groups from carrying out actions that are highly illegal; and, on the other hand, the lack of global governance capacity to reliably solve the gravest international coordination problems even when vital national interests by default incentivize states to defect. General stabilization against potential civilizational vulnerabilities—in a world where technological innovation is occurring rapidly along a wide frontier, and in which there are large numbers of actors with a diverse set of human-recognizable motivations—would require that both of these governance gaps be eliminated. Until such a time as this is accomplished, humanity will remain vulnerable to drawing a technological black ball.
Clearly, these reflections prove a pro tanto reason to support strengthening surveillance capabilities and preventive policing systems and for favoring a global governance regime that is capable of decisive action (whether based on unilateral hegemonic strength or powerful multilateral institutions). However, we have not settled whether these things would be desirable all-things-considered, since doing so would require analyzing a number of other strong considerations that lie outside the scope of this paper.
Because our main goal has been to put some signposts up in the macrostrategic landscape, we have focused our discussion at a fairly abstract level, developing concepts that can help us orient ourselves (with respect to long-term outcomes and global desirabilities) somewhat independently of the details of our varying local contexts.
In practice, were one to undertake an effort to stabilize our civilization against potential black balls, one might find it prudent to focus initially on partial solutions and low-hanging fruits. Thus, rather than directly trying to bring about extremely effective preventive policing or strong global governance, one might attempt to patch up particular domains where black balls seem most likely to appear. One could, for example, strengthen oversight of biotechnology-related activities by developing better ways to track key materials and equipment, and to monitor activities within labs. One could also tighten know-your-customer regulations in the biotech supply sector, and expand the use of background checks for personnel working in certain kinds of labs or involved with certain kinds of experiment. One can improve whistleblower systems, and try to raise biosecurity standards globally. One could also pursue differential technological development, for instance by strengthening the biological weapons convention and maintaining the global taboo on biological weapons. Funding bodies and ethical approval committees could be encouraged to take broader view of the potential consequences of particular lines of work, focusing not only on risks to lab workers, test animals, and human research subjects, but also on ways that the hoped-for findings might lower the competence bar for bioterrorists down the road. Work that is predominantly protective (such as disease outbreak monitoring, public health capacity building, improvement of air filtration devices) could be differentially promoted.
Nevertheless, while pursuing such limited objectives, one should bear in mind that the protection they would offer covers only special subsets of scenarios, and might be temporary. If one finds oneself in a position to influence the macroparameters of preventive policing capacity or global governance capacity, one should consider that fundamental changes in those domains may be the only way to achieve a general ability to stabilize our civilization against emerging technological vulnerabilities.


New Comment
17 comments, sorted by Click to highlight new comments since: Today at 9:27 AM

I'm not sure preventing these risks requires a global Orwellian state as Bostrom says. Manufacture of computers and phones already has some narrow bottlenecks, allowing the US government to set up global surveillance of all research involving computers without letting anyone know. Sprinkle some machine learning to detect dangerous research, and you don't even need a huge staff. Maybe it's already done. (Though on the other hand, maybe not.)

A programmer in a basement writes some code. That code is picked up and sent to you at the computer monitoring station. You read it and can't understand it. Now what? You don't know the nature of intelligence. It might be possible for a team of very smart people to unravel an arbitrary piece of spaghetti code, and prove that its safe, sometimes. (Rice theorem says you can't always prove anything about code) But incompetent coders are producing buckets of the stuff, and expect it to run the moment they press go.

An algorithm that can understand arbitrary code, to the level where it can test for intelligence, and can run in a split second on the dev's laptop (so they don't notice a delay) is well into foom territory. A typical programmer will see little more than suggestively named variables and how many if statements are used, if they have to quickly scan other peoples code to see if its "safe".

One can't understand code, but predicting the goals of the programmer may be a simpler task. If he has read "Superintelligence", googled "self-improving AI" and is an expert in ML, the fact that he locked himself into a basement may be alarming.

Does anyone know how this paper relates to Paul Christiano's blog post titled Handling destructive technology, which seems to preempt some of the key ideas? It's not directly acknowledged in the paper.

It seems to me that this is the crux:

A key concern in the present context is whether the consequences of civilization continuing in the current semi-anarchic default condndition are catastrophic enough to outweigh reasonable objections to the drastic developments that would be required to exit this condition. [Emphasis in original]

That only matters if you're in a position to enact the "drastic developments" (and to do so without incurring some equally bad catastrophe in the process). If you're not in a position to make something happen, then it doesn't matter whether it's the right thing to do or not.

Where's there any sign that any person or group has or ever will have the slightest chance of being able to cause the world to exit the "semi-anarchic default condition", or the slightest idea of how to go about doing so? I've never seen any. So what's the point in talking about it?

The mean person has 1 / 7 billionth control over the fate of humanity. There's your slightest chance right there!

Edit: In other words, the world is big but not infinite. We are small but not infinitesimal.

[-][anonymous]4y 3

Exiting the "semi-anarchic default condition", if it happens, seems likely to be a slow and distributed process, since no one group can make global decisions until we exit that condition. The state of thought and discussion generally, and opinions of prominent people like Nick Bostrom particularly, around the issue will probably influence the general current of "small" decisions toward or away from an exit. Thus, getting closer to the right answer here may slightly increase our chances in the long run. Not a primary concern, but worth some discussion, I think.

I'm pretty sure that the semi-anarchic default condition is a stable equilibrium. As soon as any power structure started to coalesce, everybody who wasn't a part of it would feel threatened by it and attack it. Once having neutralized the threat, any coalitions that had formed against it would themselves self-destruct in internal mistrust. If it's even possible to leave an equilibrium like that, you definitely can't do it slowly.

On the other hand, the post-semi-anarchic regime is probably fairly unstable... anybody who gets out from under it a little bit can use that to get out from under it more. And many actors have incentives to do so. Maybe you could stay in it, but only if you spent a lot of its enforcement power on the meta-problem of keeping it going.

My views on this may be colored by the fact that Bostrom's vision for the post-semi-anarchic condition in itself sounds like a catastrophic outcome to me, not least because it seems obvious to me that it would immediately be used way, way beyond any kind of catastrophic risk management, to absolutely enforce and entrench any and every social norm that could get 51 percent support, and to absolutely suppress all dissent. YMMV on that part, but anyway I don't think my view of whether it's possible is that strongly determined by my view that it's undesirable.

[-][anonymous]4y 2

Hmm. I think you're right. I just realized I don't have any actual models for how we might exit the semi-anarchy without friendly superintelligence (it seemed hard, so I assumed gradualism), and it seems dangerous to try.

Furthermore, in reference to the crux in your original comment, the semi-anarchy doesn't seem dangerous enough for a world government to improve our chances. What we're looking for is global coordination capacity, and we can improve that without building one.

(Comment duplicated from the EA Forum.)

I think the central "drawing balls from an urn" metaphor implies a more deterministic situation than that which we are actually in – that is, it implies that if technological progress continues, if we keep drawing balls from the urn, then at some point we will draw a black ball, and so civilizational devastation is basically inevitable. (Note that Nick Bostrom isn't actually saying this, but it's an easy conclusion to draw from the simplified metaphor). I'm worried that taking this metaphor at face value will turn people towards broadly restricting scientific development more than is necessarily warranted.

I offer a modification of the metaphor that relates to differential technological development. (In the middle of the paper, Bostrom already proposes a few modifications of the metaphor based on differential technological development, but not the following one). Whenever we draw a ball out of the urn, it affects the color of the other balls remaining in the urn. Importantly, some of the white balls we draw out of the urn (e.g., defensive technologies) lighten the color of any grey/black balls left in the run. A concrete example of this would be the summation of the advances in medicine over the past century, which have lowered the risk of a human-caused global pandemic. Therefore, continuing to draw balls out of the urn doesn't inevitably lead to civilizational disaster – as long as we can be sufficiently discriminate towards those white balls which have a risk-lowering effect.

I discuss a different reformulation in my new paper, "Systemic Fragility as a Vulnerable World" casting this as an explore/exploit tradeoff in a complex space. In the paper, I explicitly discuss the way in which certain subspaces can be safe or beneficial.

"The push to discover new technologies despite risk can be understood as an explore/exploit tradeoff in a potentially dangerous environment. At each stage, the explore action searches the landscape for new technologies, with some probability of a fatal result, and some probability of discovering a highly rewarding new option. The implicit goal in a broad sense is to find a search strategy that maximize humanity's cosmic endowment - neither so risk-averse that advanced technologies are never explored or developed, nor so risk-accepting that Bostrom's postulated Vulnerable World becomes inevitable. Either of these risks astronomical waste. However, until and unless the distribution of black balls in Bostrom's technological urn is understood, we cannot specify an optimal strategy. The first critical question addressed by Bostrom - ``Is there a black ball in the urn of possible inventions?'' is, to reframe the question, about the existence of negative singularities in the fitness landscape."

A similar concept is the idea of offense-defense balance in international relations. eg, large stockpiles of nuclear weapons strongly favor “defense” (well, deterrence) because it’s prohibitively costly to develop the capacity to reliably destroy the enemy’s second-strike forces. Note the caveats there: at sufficient resource levels, and given constraints imposed by other technologies (eg inability to detect nuclear subs).

Allan Dafoe and Ben Garfinkel have a paper out on how techs tend to favor offense at low investment and defense at high investment. (That is, the resource ratio R at which an attacker with resources RD has an X% chance of defeating a defender with resources D tends to decrease with D up to a local maximum, then increase.)

(On mobile, will link later.)

I think it’s a sad and powerful Overton window demonstration that these days someone can write a paper like this without even mentioning space colonization, which is the obvious alternate endgame if you want a non-global-dictatorship solution.

Some of Bostrom's key papers are primarily about the massive importance of colonising space soon, and other researchers at the institution he founded have written papers trying to do basic modelling of plans to ensure we're able to use all the resources in the universe. It's inaccurate to say that this isn't something that these researchers think about a lot and care about.

But I don't think it affects this paper. There can be technologies that pose such existential threats (e.g. superintelligent AGI) that it doesn't matter how far away you are when you make them (well, I suppose if we leave each others' light cones then that's a bit different, though there are ways to get around that barrier). So I think many of these arguments will go through if you assume we've, say, built dyson spheres and shot out into the galaxies.

Nick's space papers are largely about how to harvest large amounts of utility from the galaxy, not about how to increase humanity's robustness. And yes, there are some Xrisks (including the one I am focused on) that space colonies do not help with, but the reader may not be convinced of these, so it is surely worth mentioning that some risks would be guarded against with interstellar diversification. If nothing else you should probably argue that space colonization is not an adequate solution for these reasons.

I don't think the Urn of Invention analogy works.

  • There are no white or black balls, there is only gray.
  • technologies and discoveries don't exist context free. Unless were handed a technology from aliens, we understand the precursor discoveries.

We already have ways of creating weapons of mass destruction relatively easy and we have adjusted regulation and law enforcement to deal with it.

Consider an another analogy.

The urn of literature. We have literature which is just interesting but is emotionally neutral these are white balls.

We have literature that causes people to have emotions. These are gray balls because they can evoke people to good and bad actions.

Maybe there is a magical combination of words that will cause such strong emotions that people will commit suicide or become homicidal maniacs on mass. These are the black balls.

I hope we can agree that this is absurd. Because it is absurd for the same reason the urn if invention is absurd.... That isn't how literature or technology works.

As an extension of Bostrom's ideas, I have written a draft entitled " Systemic Fragility as a Vulnerable World " where I introduce the "Fragile World Hypothesis."


The possibility of social and technological collapse has been the focus of science fiction tropes for decades, but more recent focus has been on specific sources of existential and global catastrophic risk. Because these scenarios are simple to understand and envision, they receive more attention than risks due to complex interplay of failures, or risks that cannot be clearly specified. In this paper, we discuss a new hypothesis that complexity of a certain type can itself function as a source of risk. This ”Fragile World Hypothesis” is compared to Bostroms ”Vulnerable World Hypothesis”, and the assumptions and potential mitigations are contrasted.