My forthcoming paper, “Disjunctive Scenarios of Catastrophic AI Risk”, attempts to introduce a number of considerations to the analysis of potential risks from Artificial General Intelligence (AGI). As the paper is long and occasionally makes for somewhat dry reading, I thought that I would briefly highlight a few of the key points raised in the paper.
The main idea here is that most of the discussion about risks of AGI has been framed in terms of a scenario that goes something along the lines of “a research group develops AGI, that AGI develops to become superintelligent, escapes from its creators, and takes over the world”. While that is one scenario that could happen, focusing too much on any single scenario makes us more likely to miss out alternative scenarios. It also makes the scenarios susceptible to criticism from people who (correctly!) point out that we are postulating very specific scenarios that have lots of burdensome details.
To address that, I discuss here a number of considerations that suggest disjunctive paths to catastrophic outcomes: paths that are of the form “A or B or C could happen, and any one of them happening could have bad consequences”.
Superintelligence versus Crucial Capabilities
Bostrom’s Superintelligence, as well as a number of other sources, basically make the following argument:
This is an important argument to make and analyze, since superintelligence basically represents an extreme case: if an individual AGI may become as powerful as it gets, how do we prepare for that eventuality? As long as there is a plausible chance for such an extreme case to be realized, it must be taken into account.
However, it is probably a mistake to focus only on the case of superintelligence. Basically, the reason why we are interested in a superintelligence is that, by assumption, it has the cognitive capabilities necessary for a world takeover. But what about an AGI which also had the cognitive capabilities necessary for taking over the world, and only those?
Such an AGI might not count as a superintelligence in the traditional sense, since it would not be superhumanly capable in every domain. Yet, it would still be one that we should be concerned about. If we focus too much on just the superintelligence case, we might miss the emergence of a “dumb” AGI which nevertheless had the crucial capabilities necessary for a world takeover.
That raises the question of what might be such crucial capabilities. I don’t have a comprehensive answer; in my paper, I focus mostly on the kinds of capabilities that could be used to inflict major damage: social manipulation, cyberwarfare, biological warfare. Others no doubt exist.
A possibly useful framing for future investigations might be, “what level of capability would an AGI need to achieve in a crucial capability in order to be dangerous”, where the definition of “dangerous” is free to vary based on how serious of a risk we are concerned about. One complication here is that this is a highly contextual question – with a superintelligence we can assume that the AGI may get basically omnipotent, but such a simplifying assumption won’t help us here. For example, the level of offensive biowarfare capability that would pose a major risk, depends on the level of the world’s defensive biowarfare capabilities. Also, we know that it’s possible to inflict enormous damage to humanity even with just human-level intelligence: whoever is authorized to control the arsenal of a nuclear power could trigger World War III, no superhuman smarts needed.
Crucial capabilities are a disjunctive consideration because they show that superintelligence isn’t the only level of capability that would pose a major risk: and there many different combinations of various capabilities – including ones that we don’t even know about yet – that could pose the same level of danger as superintelligence.
Incidentally, this shows one reason why the common criticism of “superintelligence isn’t something that we need to worry about because intelligence isn’t unidimensional” is misfounded – the AGI doesn’t need to be superintelligent in every dimension of intelligence, just the ones we care about.
How would the AGI get free and powerful?
In the prototypical AGI risk scenario, we are assuming that the developers of the AGI want to keep it strictly under control, whereas the AGI itself has a motive to break free. This has led to various discussions about the feasibility of “oracle AI” or “AI confinement” – ways to restrict the AGI’s ability to act freely in the world, while still making use of it. This also means that the AGI might have a hard time acquiring the resources that it needs for a world takeover, since it either has to do so while it is under constant supervision by its creators, or while on the run from them.
However, there are also alternative scenarios where the AGI’s creators voluntarily let it free – or even place it in control of e.g. a major corporation, free to use that corporation’s resources as it desires! My chapter discusses several ways by which this could happen: i) economic benefit or competitive pressure, ii) criminal or terrorist reasons, iii) ethical or philosophical reasons, iv) confidence in the AI’s safety, as well as v) desperate circumstances such as being otherwise close to death. See the chapter for more details on each of these. Furthermore, the AGI could remain theoretically confined but be practically in control anyway – such as in a situation where it was officially only giving a corporation advice, but its advice had never been wrong before and nobody wanted to risk their jobs by going against the advice.
Would the Treacherous Turn involve a Decisive Strategic Advantage?
Looking at crucial capabilities in a more fine-grained manner also raises the question of when an AGI would start acting against humanity’s interests. In the typical superintelligence scenario, we assume that it will do so once it is in a position to achieve what Bostrom calls a Decisive Strategic Advantage (DSA): “a level of technological and other advantages sufficient to enable [an AI] to achieve complete world domination”. After all, if you are capable of achieving superintelligence and a DSA, why act any earlier than that?
Even when dealing with superintelligences, however, the case isn’t quite as clear-cut. Suppose that there are two AGI systems, each potentially capable of achieving a DSA if they prepare for long enough. But the longer that they prepare, the more likely it becomes that the other AGI sets its plans in motion first, and achieves an advantage over the other. Thus, if several AGI projects exist, each AGI is incentivized to take action at such a point which maximizes its overall probability of success – even if the AGI only had rather slim chances of succeeding in the takeover, if it thought that waiting for longer would make its chances even worse.
Indeed, an AGI which defects on its creators may not be going for a world takeover in the first place: it might, for instance, simply be trying to maneuver itself into a position where it can act more autonomously and defeat takeover attempts by other, more powerful AGIs. The threshold for the first treacherous turn could vary quite a bit, depending on the goals and assets of the different AGIs; various considerations are discussed in the paper.
A large reason for analyzing these kinds of scenarios is that, besides caring about existential risks, we also care about catastrophic risks – such as an AGI acting too early and launching a plan which resulted in “merely” hundreds of millions of deaths. My paper introduces the term Major Strategic Advantage, defined as “a level of technological and other advantages sufficient to pose a catastrophic risk to human society”. A catastrophic risk is one that might inflict serious damage to human well-being on a global scale and cause ten million or more fatalities.
“Mere” catastrophic risks could also turn into existential ones, if they contribute to global turbulence (Bostrom et al. 2017), a situation in which existing institutions are challenged, and coordination and long-term planning become more difficult. Global turbulence could then contribute to another out-of-control AI project failing even more catastrophically and causing even more damage
Summary table and example scenarios
The table below summarizes the various alternatives explored in the paper.
AI’s level of strategic advantage
AI’s capability threshold for non-cooperation
Sources of AI capability
Ways for the AI to achieve autonomy
Number of AIs
And here are some example scenarios formed by different combinations of them:
The classic takeover
(Decisive strategic advantage, high capability threshold, intelligence explosion, escaped AI, single AI)
The “classic” AI takeover scenario: an AI is developed, which eventually becomes better at AI design than its programmers. The AI uses this ability to undergo an intelligence explosion, and eventually escapes to the Internet from its confinement. After acquiring sufficient influence and resources in secret, it carries out a strike against humanity, eliminating humanity as a dominant player on Earth so that it can proceed with its own plans unhindered.
The gradual takeover
(Major strategic advantage, high capability threshold, gradual shift in power, released for economic reasons, multiple AIs)
Many corporations, governments, and individuals voluntarily turn over functions to AIs, until we are dependent on AI systems. These are initially narrow-AI systems, but continued upgrades push some of them to the level of having general intelligence. Gradually, they start making all the decisions. We know that letting them run things is risky, but now a lot of stuff is built around them, it brings a profit and they’re really good at giving us nice stuff—for the while being.
The wars of the desperate AIs
(Major strategic advantage, low capability threshold, crucial capabilities, escaped AIs, multiple AIs)
Many different actors develop AI systems. Most of these prototypes are unaligned with human values and not yet enormously capable, but many of these AIs reason that some other prototype might be more capable. As a result, they attempt to defect on humanity despite knowing their chances of success to be low, reasoning that they would have an even lower chance of achieving their goals if they did not defect. Society is hit by various out-of-control systems with crucial capabilities that manage to do catastrophic damage before being contained.
Is humanity feeling lucky?
(Decisive strategic advantage, high capability threshold, crucial capabilities, confined but effectively in control, single AI)
Google begins to make decisions about product launches and strategies as guided by their strategic advisor AI. This allows them to become even more powerful and influential than they already are. Nudged by the strategy AI, they start taking increasingly questionable actions that increase their power; they are too powerful for society to put a stop to them. Hard-to-understand code written by the strategy AI detects and subtly sabotages other people’s AI projects, until Google establishes itself as the dominant world power.
This blog post was written as part of work for the Foundational Research Institute.
Thanks for your work on this. I think having some explicit examples of these kinds of scenarios helps make clearer the broad range of ways things could go badly, especially when it happens slowly and it's not easy to notice until it's too late. I think there's especially a lot of value in calling out specific scenarios which may be very like scenarios people actually find themselves in later since it will help them notice they are matching a dangerous pattern and should consider AI safety more if they have thus far failed to.
This is great - thanks for posting it!
That made me properly realise something that I now feel should've been blindingly obvious to me already: Work to reduce humanity's/civilization's "vulnerabilities" in general may also help with a range of global catastrophic or existential risk scenarios where AI risk is the "trigger".
I imagine I must've already been aware of that in some sense, and I think it's implicit in various other things such as discussions of how AI could interact with e.g. nuclear weapons tech. But I don't think I'd previously thought explicitly about situations in which risk from an agenty AI pursuing its own goals (rather than e.g. AI just making automated launch decisions) could be exacerbated or mitigated by e.g. work on biorisk, because other major risks could be what the AI would harmfully use as "tools".
I'm not sure whether or not this should actually lead to a substantial update in e.g. how valuable I think biorisk work or work to reduce numbers of nuclear weapons is. But I'm glad to at least have that question explicitly in mind now.
Nice work. Is this meant to be persuasive (for those raising concerns of burdensome details), prescriptive (there could be safeguards ineffective against an ASI but effective against a "dumb" AI), or both?
Similarly, absent formal resolution of the alignment problem, do you think there are mitigatory avenues available against an MSA? That is, things to which we‘d devote 10% of our safety resources, if we believed that DSA is 90% likely and MSA is 10% likely, conditional on the emergence of general intelligence.
Whether there are mitigatory avenues - I would assume so, but it feels hard to know what exactly they might be. So much depends on the general landscape of both society and technology, and the exact details of how these technologies turn out to work. For instance, if it does turn out that a lot of corporations are starting to employ proto-AGI systems in running their daily business, then maybe you could tackle that with some kind of regulation. But that's assuming a specific scenario, and even within that scenario, there are probably a lot of nuances that you'd need to get right in order to make the regulation effective (most of which I wouldn't know about, since I'm not an expert on regulation).