The "Post-Singularity Social Contract" and Bostrom's "Vulnerable World Hypothesis"

by philosophytorres4 min read25th Nov 20186 comments



I thought it might be worth outlining a few interesting (at least in my view!) parallels between Nick Bostrom’s recent working draft paper on the “vulnerable world hypothesis” (VWH) and my slightly less recent book chapter “Superintelligence and the Future of Governance: On Prioritizing the Control Problem at the End of History.”

In brief, my chapter offers an explicit (and highly speculative) proposal for extricating ourselves from the “semi-anarchic default condition” in which we currently find ourselves, although I don’t use that term. But I do more or less describe the three features that Bostrom identifies of this condition:

(1) The phenomenon of “agential risks”—which Bostrom refers to as the “apocalyptic residual,” a phrase that I think will ultimately introduce confusion and thus ought to be eschewed!—entails that, once the destructive technological means become available, the world will almost certainly face a global-scale catastrophe.

(2) I argue that sousveillance and the “transparent society” model (of David Brin) are both inadequate to prevent with virtually 100 percent reliability global-scale attacks from risky agents. Furthermore, I contend that (the alternative of) asymmetrical invasive global surveillance systems will almost certainly be misused and abused by those ("mere humans") in charge, thus threatening another type of existential hazard: Totalitarianism.

(3) Finally, I suggest that a human-controlled singleton (or global governing system) is unlikely to take shape on the relevant timescales. That is, yes, there appears to be some general momentum toward a unipolar configuration (see Bostrom’s “singleton hypothesis,” which seems plausible to me), but unprecedented destructive capabilities will likely be widely distributed among nonstate actors before this happens.

I then argue for a form of algocracy: One way to avoid almost certain doom (due to agential risks and world-destroying technologies) is to design a superintelligent algorithm for the purpose of coordinating planetary affairs and essentially spying on all citizens for the purpose of preemptively obviating global-scale attacks, some of which could have irreversible consequences. Ultimately, my conclusion is that, given the exponential development of dual-use emerging technologies, we may need to actually accelerate work on both (a) the philosophical control problem, and (b) the technical problem of creating an artificial superintelligence. As I write:

First, everything hangs on our ability to solve the control problem and create a friendly superintelligence capable of wise governance. This challenge is formidable enough given that many AI experts anticipate a human-level AI within this century—meaning that there appears to be a deadline—but the trends outlined in Figure 2 [click here] open up the possibility that we may have even less time to figure out what our “human values” are and how they can be encoded in “the AI’s programming language, and ultimately in primitives such as mathematical operators and addresses pointing to the contents of individual memory registers” (Bostrom 2014). Thus, the present paper offers a novel reason for allocating large amounts of resources for projects that focus on solving the control problem: not only will continued progress in computer science make the control problem probably unavoidable, but the convergence of state and nonstate power could require new forms of global governance—namely, a friendly supersingleton—within the coming decades.

This being said, here are some differences between the two papers:

(i) Bostrom also emphasizes phenomena like non-omnicidal actors who are locked in situations that, for structural reasons, lead them to pursue actions that cause disaster.

(ii) Whereas Bostrom focuses on the possibility of extracting a “black ball”—i.e., a “technology that invariably or by default destroys the civilization that invents it”—out of the tubular urn of innovation, I am essentially suggesting that we may have already extracted such a ball. For example, synthetic biology (more specifically: de-skilling plus the Internet plus lowering costs of lab equipment) will almost certainly place unprecedented destructive power within arm’s reach of a large number of terrorist groups or even single individuals. As I have elsewhere demonstrated—in two articles on agential risks—there are plenty of violent individuals with omnicidal inclinations looming in the shadows of society! This being said, some pretty simple calculations reveal that the probability of a successful global-scale attack by any one group or individual need be only negligible for annihilation to be more or less guaranteed over the course of decades or centuries, due to the fact that probability accumulates across space and time. Both John Sotos (in an article titled “Biotechnology and the Lifetime of Technical Civilizations”) and I (in an article titled “Facing Disaster: The Great Challenges Framework,” although I had similar calculations in my 2017 book) have crunched the numbers to show that, quoting from “Facing Disaster”:

For the sake of illustration, let’s posit that there are 1,000 terror agents in a population of 10 billion and that the probability per decade of any one of these individuals gaining access to world-destroying weapons … is only 1 percent. What overall level of existential risk would this expose the entire population to? It turns out that, given these assumptions, the probability of a doomsday attack per decade would be a staggering 99.995 percent. One gets the same result if the number of terror agents is 10,000 and the probability of access is 0.1 percent, or if the number is 10 million and the probability is 0.000001. Now consider that the probability of access may become far greater than 0.000001—or even 1—percent, given the trend of [the radical democratization of science and technology], and that the number of terror agents could exceed 10 million, which is a mere 0.1 percent of 10 billion. It appears that an existential strike could be more or less inescapable.

This suggests that a scenario as extreme as the “easy nuke” one that Bostrom outlines in his paper need not be the case for the conclusion of the VWH to obtain. Recall that this hypothesis states:

If technological development continues then a set of capabilities will at some point be attained that make the devastation of civilization extremely likely, unless civilization sufficiently exits the semi-anarchic default condition.

My sense is, again, that these capabilities are already with us today—that is, they are emerging (rapidly) from the exponentially growing field of synthetic biology, although it’s possible that they could arise from atomically-precise manufacturing as well. Indeed, this is a premise of my argument in “Superintelligence and the Future of Governance,” which I refer to as “The Threat of Universal Unilateralism.” Indeed, I summarize my argument in section 1 of the paper as follows:

(i) The Threat of Universal Unilateralism: Emerging technologies are enabling a rapidly growing number of nonstate actors to unilaterally inflict unprecedented harm on the global village; this trend of mass empowerment is significantly increasing the probability of an existential catastrophe—and could even constitute a Great Filter (Sotos 2017).

(ii) The Preemption Principle: If we wish to obviate an existential catastrophe, then societies will need a way to preemptively avert not just most but all possible attacks with existential consequences, since the consequences of an existential catastrophe are by definition irreversible.

(iii) The Need for a Singleton: The most effective way to preemptively avert attacks is through some regime of mass surveillance that enables governing bodies to monitor the actions, and perhaps even the brain states, of citizens; ultimately, this will require the formation of a singleton.

(iv) The Threat of State Dissolution: The trend of (i) will severely undercut the capacity of governing bodies to effectively monitor their citizens, because the capacity of states to provide security depends upon a sufficiently large “power differential” between themselves and their citizens.

(v) The Limits of Security: If states are unable to effectively monitor their citizens, they will be unable to neutralize the threat posed by (i), thus resulting in a high probability of an existential catastrophe.

There seem, at least to my eyes, to be some significant overlaps here with Bostrom's (fascinating) paper, which suggests a possible convergence of scholars on a single conception of humanity's (near-)future predicament on spaceship Earth. Perhaps the main difference is, once more, that Bostrom’s thesis hinges upon the notion of a not-yet-realized-but-maybe-possible “black ball” technology, whereas the message of my analysis is far more urgent: “Weapons of total destruction” (WTDs) already exist, are in their adolescence (so to speak), and will likely mature in the coming years and decades. Put differently, humanity will need to escape the semi-anarchic default condition in which we current reside quite soon or else face almost certain annihilation. My solution is algocratic, with everything depending upon success on the control problem.

This post is, admittedly, quite hastily written. Please don’t hesitate to let me know if aspects of it are opaque and thus require clarification. I would also—as always—welcome comments of any sort!!



6 comments, sorted by Highlighting new comments since Today at 9:10 AM
New Comment

There's a huge, mysterious gap between the possibility of terrorism and actual terrorism. To leave a city without power or water, you only need a handful of ex-military people without any special tech, and the pool of people capable of that today is surely larger than the pool of machine learning enthusiasts. So why isn't that happening all the time? It seems that we are protected in many ways: there are fewer terrorists than we think, they are less serious about it, fewer of them can cooperate with each other, a larger proportion than we think are already being tracked, etc.

So before building a global singleton, I'd look into why current measures work so well, and how they can be expanded. You could probably get >99% of the x-risk protection at <1% of the cost.

there are fewer terrorists than we think, they are less serious about it

Gwern wrote about this in his essay “Terrorism is not about terror”.

Yes, it is interesting. But a few pilots were capable to create almost apocalyptic destruction during 9.11, and it could be even worse if they were capable to hit a nuclear power plant or White House.

There were also a few smaller cases of deliberate plane crashing by pilots.

I don't think we know enough about human beliefs to say that anarchy (in the form of individual, indexical valuations) isn't a fundamental component of our CEV. We _like_ making individual choices, even when those choices are harmful or risky.

What's the friendly-AI take on removing (important aspects of) humanity in order to further intelligence preservation and expansion?

I don't believe that present-day synthentic biology is anywhere close to being able to create "total destruction" or "almost certain annihilation"... and in fact it may never get there without more-than-human AI.

If you made super-nasty smallpox and spread it all over the place, it would suck, for sure, but it wouldn't kill everybody and it wouldn't destroy "technical civilization", either. Human institutions have survived that sort of thing. The human species has survived much worse. Humans have recovered from really serious population bottlenecks.

Even if it were easy to create any genome you wanted and put it into a functioning organism, nobody knows how to design it. Biology is monstrously complicated. It's not even clear that a human can hold enough of that complexity in mind to ever design a weapon of total destruction. Such a weapon might not even be possible; there are always going to be oddball cases where it doesn't work.

For that matter, you're not even going to be creating super smallpox in your garage, even if you get the synthesis tools. An expert could maybe identify some changes that might make a pathogen worse, but they'd have to test it to be sure. On human subjects. Many of them. Which is conspicuous and expensive and beyond the reach of the garage operator.

I actually can't think of anything already built or specifically projected that you could use to reliably kill everybody or even destroy civilization... except maybe for the AI. Nanotech without AI wouldn't do it. And even the AI involves a lot of unknowns.

I have the following idea how to solve this conundrum. A global control system capable to find all dangerous agents could be created using some Narrow AI, not superintelligent agential AI. This may look like ubiquitous surveillance with human faces and actions recognition capabilities.

Another part of this Narrow AI Nanny is its capability to provide decisive strategic advantage to the owner and help him quickly take over the world (for example, by leveraging nuclear strategy and military intelligence) - which is needed to prevent appearing of dangerous agents in other countries.

Yes, it looks like a totalitarianism and especially its Chinese version. But the extinction is worse than totalitarianism. I lived most of my life during totalitarian regimes and - I hate to said it - but 90 per cent of time the life is normal under them. So totalitarianism is survivable and calling its x-risk is overestimation.

I wrote more about the idea here: