Agential Risks: A Topic that Almost No One is Talking About

by philosophytorres8 min read15th Oct 201631 comments


Existential Risk
Personal Blog

(Happy to get feedback on this! It draws from and expounds ideas in this article:

Consider a seemingly simple question: if the means were available, who exactly would destroy the world? There is surprisingly little discussion of this question within the nascent field of existential risk studies. But it’s an absolutely crucial issue: what sort of agent would either intentionally or accidentally cause an existential catastrophe?

The first step forward is to distinguish between two senses of an existential risk. Nick Bostrom originally defined the term as: “One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.” It follows that there are two distinct scenarios, one endurable and the other terminal, that could realize an existential risk. We can call the former an extinction risk and the latter a stagnation risk. The importance of this distinction with respect to both advanced technologies and destructive agents has been previously underappreciated.

So, the question asked above is actually two questions in disguise. Let’s consider each in turn.

Terror: Extinction Risks

First, the categories of agents who might intentionally cause an extinction catastrophe are fewer and smaller than one might think. They include:

(1) Idiosyncratic actors. These are malicious agents who are motivated by idiosyncratic beliefs and/or desires. There are instances of deranged individuals who have simply wanted to kill as many people as possible and then die, such as some school shooters. Idiosyncratic actors are especially worrisome because this category could have a large number of members (token agents). Indeed, the psychologist Martha Stout estimates that about 4 percent of the human population suffers from sociopathy, resulting in about 296 million sociopaths. While not all sociopaths are violent, a disproportionate number of criminals and dictators have (or very likely have) had the condition.

(2) Future ecoterrorists. As the effects of climate change and biodiversity loss (resulting in the sixth mass extinction) become increasingly conspicuous, and as destructive technologies become more powerful, some terrorism scholars have speculated that ecoterrorists could become a major agential risk in the future. The fact is that the climate is changing and the biosphere is wilting, and human activity is almost entirely responsible. It follows that some radical environmentalists in the future could attempt to use technology to cause human extinction, thereby “solving” the environmental crisis. So, we have some reason to believe that this category could become populated with a growing number of token agents in the coming decades.

(3) Negative utilitarians. Those who hold this view believe that the ultimate aim of moral conduct is to minimize misery, or “disutility.” Although some negative utilitarians like David Pearce see existential risks as highly undesirable, others would welcome annihilation because it would entail the elimination of suffering. It follows that if a “strong” negative utilitarian had a button in front of her that, if pressed, would cause human extinction (say, without causing pain), she would very likely press it. Indeed, on her view, doing this would be the morally right action. Fortunately, this version of negative utilitarianism is not a position that many non-academics tend to hold, and even among academic philosophers it is not especially widespread.

(4) Extraterrestrials. Perhaps we are not alone in the universe. Even if the probability of life arising on an Earth-analog is low, the vast number of exoplanets suggests that the probability of life arising somewhere may be quite high. If an alien species were advanced enough to traverse the cosmos and reach Earth, it would very likely have the technological means to destroy humanity. As Stephen Hawking once remarked, “If aliens visit us, the outcome would be much as when Columbus landed in America, which didn’t turn out well for the Native Americans.”

(5) Superintelligence. The reason Homo sapiens is the dominant species on our planet is due almost entirely to our intelligence. It follows that if something were to exceed our intelligence, our fate would become inextricably bound up with its will. This is worrisome because recent research shows that even slight misalignments between our values and those motivating a superintelligence could have existentially catastrophic consequences. But figuring out how to upload human values into a machine poses formidable problems — not to mention the issue of figuring out what our values are in the first place.

Making matters worse, a superintelligence could process information at about 1 million times faster than our brains, meaning that a minute of time for us would equal approximately 2 years in time for the superintelligence. This would immediately give the superintelligence a profound strategic advantage over us. And if it were able to modify its own code, it could potentially bring about an exponential intelligence explosion, resulting in a mind that’s many orders of magnitude smarter than any human. Thus, we may have only one chance to get everything just right: there’s no turning back once an intelligence explosion is ignited.

A superintelligence could cause human extinction for a number of reasons. For example, we might simply be in its way. Few humans worry much if an ant genocide results from building a new house or road. Or the superintelligence could destroy humanity because we happen to be made out of something it could use for other purposes: atoms. Since a superintelligence need not resemble human intelligence in any way — thus, scholars tell us to resist the dual urges of anthropomorphizing and anthropopathizing — it could be motivated by goals that appear to us as utterly irrational, bizarre, or completely inexplicable.

Terror: Stagnation Risks

Now consider the agents who might intentionally try to bring about a scenario that would result in a stagnation catastrophe. This list subsumes most of the list above in that it includes idiosyncratic actors, future ecoterrorists, and superintelligence, but it probably excludes negative utilitarians, since stagnation (as understood above) would likely induce more suffering than the status quo today. The case of extraterrestrials is unclear, given that we can infer almost nothing about an interstellar civilization except that it would be technologically sophisticated.

For example, an idiosyncratic actor could harbor not a death wish for humanity, but a “destruction wish” for civilization. Thus, she or he could strive to destroy civilization without necessarily causing the annihilation of Homo sapiens. Similarly, a future ecoterrorist could hope for humanity to return to the hunter-gatherer lifestyle. This is precisely what motivated Ted Kaczynski: he didn’t want everyone to die, but he did want our technological civilization to crumble. And finally, a superintelligence whose values are misaligned with ours could modify Earth in such a way that our lineage persists, but our prospects for future development are permanently compromised. Other stagnation scenarios could involve the following categories:

(6) Apocalyptic terrorists. History is overflowing with groups that not only believed the world was about to end, but saw themselves as active participants in an apocalyptic narrative that’s unfolding in realtime. Many of these groups have been driven by the conviction that “the world must be destroyed to be saved,” although some have turned their activism inward and advocated mass suicide.

Interestingly, no notable historical group has combined both the genocidal and suicidal urges. This is why apocalypticists pose a greater stagnation terror risk than extinction risk: indeed, many see their group’s survival beyond Armageddon as integral to the end-times, or eschatological, beliefs they accept. There are almost certainly less than about 2 million active apocalyptic believers in the world today, although emerging environmental, demographic, and societal conditions could cause this number to significantly increase in the future, as I’ve outlined in detail elsewhere (see Section 5 of this paper).

(7) States. Like terrorists motivated by political rather than transcendent goals, states tend to place a high value on their continued survival. It follows that states are unlikely to intentionally cause a human extinction event. But rogue states could induce a stagnation catastrophe. For example, if North Korea were to overcome the world’s superpowers through a sudden preemptive attack and implement a one-world government, the result could be an irreversible decline in our quality of life.

So, there are numerous categories of agents that could attempt to bring about an existential catastrophe. And there appear to be fewer agent types who would specifically try to cause human extinction than to merely dismantle civilization.

Error: Extinction and Stagnation Risks

There are some reasons, though, for thinking that error (rather than terror) could constitute the most significant threat in the future. First, almost every agent capable of causing intentional harm would also be capable of causing accidental harm, whether this results in extinction or stagnation. For example, an apocalyptic cult that wants to bring about Armageddon by releasing a deadly biological agent in a major city could, while preparing for this terrorist act, inadvertently contaminate its environment, leading to a global pandemic.

The same goes for idiosyncratic agents, ecoterrorists, negative utilitarians, states, and perhaps even extraterrestrials. (Indeed, the large disease burden of Europeans was a primary reason Native American populations were decimated. By analogy, perhaps an extraterrestrial destroys humanity by introducing a new type of pathogen that quickly wipes us out.) The case of superintelligence is unclear, since the relationship between intelligence and error-proneness has not been adequately studied.

Second, if powerful future technologies become widely accessible, then virtually everyone could become a potential cause of existential catastrophe, even those with absolutely no inclination toward violence. To illustrate the point, imagine a perfectly peaceful world in which not a single individual has malicious intentions. Further imagine that everyone has access to a doomsday button on her or his phone; if pushed, this button would cause an existential catastrophe. Even under ideal societal conditions (everyone is perfectly “moral”), how long could we expect to survive before someone’s finger slips and the doomsday button gets pressed?

Statistically speaking, a world populated by only 1 billion people would almost certainly self-destruct within a 10-year period if the probability of any individual accidentally pressing a doomsday button were a mere 0.00001 percent per decade. Or, alternatively: if only 500 people in the world were to gain access to a doomsday button, and if each of these individuals had a 1 percent chance of accidentally pushing the button per decade, humanity would have a meager 0.6 percent chance of surviving beyond 10 years. Thus, even if the likelihood of mistakes is infinitesimally small, planetary doom will be virtually guaranteed for sufficiently large populations.

The Two Worlds Thought Experiment

The good news is that a focus on agential risks, as I’ve called them, and not just the technological tools that agents might use to cause a catastrophe, suggests additional ways to mitigate existential risk. Consider the following thought-experiment: a possible world A contains thousands of advanced weapons that, if in the wrong hands, could cause the population of A to go extinct. In contrast, a possible world B contains only a single advanced “weapon of total destruction” (WTD). Which world is more dangerous? The answer is obviously world A.

But it would be foolishly premature to end the analysis here. Imagine further that A is populated by compassionate, peace-loving individuals, whereas B is overrun by war-mongering psychopaths. Now which world appears more likely to experience an existential catastrophe? The correct answer is, I would argue, world B.

In other words: agents matter as much as, or perhaps even more than, WTDs. One simply can’t evaluate the degree of risk in a situation without taking into account the various agents who could become coupled to potentially destructive artifacts. And this leads to the crucial point: as soon as agents enter the picture, we have another variable that could be manipulated through targeted interventions to reduce the overall probability of an existential catastrophe.

The options here are numerous and growing. One possibility would involve using “moral bioenhancement” techniques to reduce the threat of terror, given that acts of terror are immoral. But a morally enhanced individual might not be less likely to make a mistake. Thus, we could attempt to use cognitive enhancements to lower the probability of catastrophic errors, on the (tentative) assumption that greater intelligence correlates with fewer blunders.

Furthermore, implementing stricter regulations on CO2 emissions could decrease the probability of extreme ecoterrorism and/or apocalyptic terrorism, since environmental degradation is a “trigger” for both.

Another possibility, most relevant to idiosyncratic agents, is to reduce the prevalence of bullying (including cyberbullying). This is motivated by studies showing that many school shooters have been bullied, and that without this stimulus such individuals would have been less likely to carry out violent rampages. Advanced mind-reading or surveillance technologies could also enable law enforcement to identify perpetrators before mass casualty crimes are committed.

As for superintelligence, efforts to solve the “control problem” and create a friendly AI are of primary concern among many many researchers today. If successful, a friendly AI could itself constitute a powerful mitigation strategy for virtually all the categories listed above.

(Note: these strategies should be explicitly distinguished from proposals that target the relevant tools rather than agents. For example, Bostrom’s idea of “differential technological development” aims to neutralize the bad uses of technology by strategically ordering the development of different kinds of technology. Similarly, the idea of police “blue goo” to counter “grey goo” is a technology-based strategy. Space colonization is also a tool intervention because it would effectively reduce the power (or capacity) of technologies to affect the entire human or posthuman population.)

Agent-Tool Couplings

Devising novel interventions and understanding how to maximize the efficacy of known strategies requires a careful look at the unique properties of the agents mentioned above. Without an understanding of such properties, this important task will be otiose. We should also prioritize different agential risks based on the likely membership (token agents) of each category. For example, the number of idiosyncratic agents might exceed the number of ecoterrorists in the future, since ecoterrorism is focused on a single issue, whereas idiosyncratic agents could be motivated by a wide range of potential grievances.[1] We should also take seriously the formidable threat posed by error, which could be nontrivially greater than that posed by terror, as the back-of-the-envelope calculations above show.

Such considerations, in combination with technology-based risk mitigation strategies, could lead to a comprehensive, systematic framework for strategically intervening on both sides of the agent-tool coupling. But this will require the field of existential risk studies to become less technocentric than it currently is.

[1] Although, on the other hand, the stimulus of environmental degradation would be experienced by virtually everyone in society, whereas the stimuli that motivate idiosyncratic agents might be situationally unique. It’s precisely issues like these that deserve further scholarly research.


27 comments, sorted by Highlighting new comments since Today at 10:19 PM
New Comment

In ten years what's the probability that a CRISPR-competent terrorist group could exterminate mankind? The optimal consequentialist anti-terrorist policies if this answer is >1% should horrify a deontologicalist.

Extremely low. I have never believed any sort of pathogen could come close to wiping us out. They can be defeated by basic breather and biohazard technology. But the main key is that with improved and more accessible biotechnology, our ability to create vaccines and other defence mechanisms against pathogens is greatly enhanced. I actually think the better biotechnology gets, the less likely any pathogen is to wipe us out, even given the fact that terrorists will be able to misuse it more easily.

I hope you are right.

Remember also that viruses that kill lots of people tend to rapidly mutate into less lethal strains due to evolutionary pressures. This is what happened with the 1917 pandemic.

Yes, but evolutionary pressures wouldn't be shaping bioterrorism created viruses in the short run. Also, until we can cure the common cold what's to prevent terrorists (in 10 years with CRISPR) from making a cold virus that's much more virulent, that stays hidden for a few months, and then kills its host.

Indeed. And since all humans are deontologists by nature, it should horrify everyone, and would.

In 20 century most risks were created by superpowers. Should we include them in the list of potential agents?

Also it seems that some risks are non-agential, as they result from collective behaviors of a group of agents, like arms race, capitalism, resource depletion, overpopulation etc.

Totally agree that some x-risks are non-agential, such as (a) risks from nature, and (b) risks produced by coordination problems, resulting in e.g. climate change and biodiversity loss. As for superpowers, I would classify them as (7). Thoughts? Any further suggestions? :-)

"Rogue country" is outside evaluative characteristic.

Lets try to define "rogue country" by its estimation-independent characteristics: 1) It is country which fight for world domination 2) It is a country which is interested in worldwide promotion of its (crazy) ideology (USSR, communism) 3) Its a country which survival is threatened by risks of aggression 4) It is a country which is ruled by crazy dictator.

I would like to say that superpowers is the type of "rogue countries", as they sometimes combine some of listed above properties.

The difference is mainly that we always had two (or three) superpowers which fight for the world domination. Sometimes one of them was on the first place and another one was challenging its position as world leader. The second superpower is more willing to create global risk, as it may rise it "status" or chances to overpower "alpha-superpower".

The topic is interesting, and there a lot what could be said on it including current political situation and even war in Syria. Just read an article today which explained this war from this point of view.

I would also add Doomsday blackmailers. These are rational agents which would create Doomsday Machine to blackmail the world with the goal of world domination.

Another option worth considering is arogant scientists, who benefit personally from dangerous experiments. Example is CERN proceeded with LHC before its safety was proven. Another group of bioscientists excavated 1918 pandemic flu, sequenced it and posted it in the internet. And another scientist deliberately created new superflu studying genetic variation which could make birds flu stronger. We could imagine a scientist who would to increase personal longevity by gene therapy, even if it poses 1 per cent pandemic risk. And if there are many of them...

Also there is a possible class of agents who try to create smaller catastrophe in order to prevent larger catastrophe. Recent movie "Inferno" is about it, where a character created a virus to kill half humanity to safe all humanity later.

I listed all my ideas in my agent map, which is here on Less Wrong

Furthermore, implementing stricter regulations on CO2 emissions could decrease the probability of extreme ecoterrorism and/or apocalyptic terrorism, since environmental degradation is a “trigger” for both.

Disregarding any discussion of legitimate climate concerns, isn't this a really bad decision? Isn't it better to be unblackmailable, to disincentivize blackmail.

What do you mean? How is mitigating climate change related to blackmail?

This discussion was about agential risks, the part I quoted was talking about extreme ecoterrorism as a result of environmental degradation. In other words, the main post was partially about stricter regulations on CO2 as a means of minimizing the risk of a potential doomsday scenario from an anti global warming group.

Good post!

While not all sociopaths are violent, a disproportionate number of criminals and dictators have (or very likely have) had the condition.

Luckily sociopaths tend to have poor impulse control.

It follows that some radical environmentalists in the future could attempt to use technology to cause human extinction, thereby “solving” the environmental crisis.

Reminds me of Derrick Jensen. He doesn't talk about human extinction, but he does talk about bringing down civilization.

Fortunately, this version of negative utilitarianism is not a position that many non-academics tend to hold, and even among academic philosophers it is not especially widespread.

For details see

This is worrisome because recent research shows that even slight misalignments between our values and those motivating a superintelligence could have existentially catastrophic consequences.

Citation? This is commonly asserted by AI risk proponents, but I'm not sure I believe it. My best friend's values are slightly misaligned relative to my own, but if my best friend became superintelligent, that seems to me like it'd be a pretty good outcome.

Citation? This is commonly asserted by AI risk proponents, but I'm not sure I believe it. My best friend's values are slightly misaligned relative to my own, but if my best friend became superintelligent, that seems to me like it'd be a pretty good outcome.

I highly recommend reading this.

I'm familiar with lots of the things Eliezer Yudkowsky has said about AI. That doesn't mean I agree with them. Less Wrong has an unfortunate culture of not discussing topics once the Great Teacher has made a pronouncement.

Plus, I don't think philosophytorres' claim is obvious even if you accept Yudkowsky's arguments.

Fragility of value thesis. Getting a goal system 90% right does not give you 90% of the value, any more than correctly dialing 9 out of 10 digits of my phone number will connect you to somebody who’s 90% similar to Eliezer Yudkowsky. There are multiple dimensions for which eliminating that dimension of value would eliminate almost all value from the future. For example an alien species which shared almost all of human value except that their parameter setting for “boredom” was much lower, might devote most of their computational power to replaying a single peak, optimal experience over and over again with slightly different pixel colors (or the equivalent thereof). Friendly AI is more like a satisficing threshold than something where we’re trying to eke out successive 10% improvements. See: Yudkowsky (2009, 2011).

From here.

OK, so do my best friend's values constitute a 90% match? A 99.9% match? Do they pass the satisficing threshold?

Also, Eliezer's boredom-free scenario sounds like a pretty good outcome to me, all things considered. If an AGI modified me so I could no longer get bored, and then replayed a peak experience for me for millions of years, I'd consider that a positive singularity. Certainly not a "catastrophe" in the sense that an earthquake is a catastrophe. (Well, perhaps a catastrophe of opportunity cost, but basically every outcome is a catastrophe of opportunity cost on a long enough timescale, so that's not a very interesting objection.) The utility function is not up for grabs--I am the expert on my values, not the Great Teacher.

Here's the abstract from his 2011 paper:

A common reaction to first encountering the problem statement of Friendly AI (“Ensure that the creation of a generally intelligent, self-improving, eventually superintelligent system realizes a positive outcome”) is to propose a single moral value which allegedly suffices; or to reject the problem by replying that “constraining” our creations is undesirable or unnecessary. This paper makes the case that a criterion for describing a “positive outcome,” despite the shortness of the English phrase, contains considerable complexity hidden from us by our own thought processes, which only search positive-value parts of the action space, and implicitly think as if code is interpreted by an anthropomorphic ghost-in-the-machine. Abandoning inheritance from human value (at least as a basis for renormalizing to reflective equilibria) will yield futures worthless even from the standpoint of AGI researchers who consider themselves to have cosmopolitan values not tied to the exact forms or desires of humanity.

It sounds to me like Eliezer's point is more about the complexity of values, not the need to prevent slight misalignment. In other words, Eliezer seems to argue here that a naively programmed definition of "positive value" constitutes a gross misalignment, NOT that a slight misalignment constitutes a catastrophic outcome.

Please think critically.

I think that small error inside a value description could result in bad result, but it is not so, if we have a list of independent values.

In phone example if I lose one digit from someone number, I will not get 90 per cent of him, but if I lose 1 phone number from my phone book, it will be 90 per cent intact.

Humans tend to have many somewhat independent values, like some may like fishing, snorkeling, girls, clouds, etc. If he lost one of them it is not a big deal, it is almost him and it happens all the time with real humans, as their predispositions could change overnight.

Awesome article! I do have a small piece of feedback to offer, though.

Interestingly, no notable historical group has combined both the genocidal and suicidal urges.

No historical group has combined both genocidal and suicidal actions, but that may be because of technological constraints. If we had had nukes widely available for millennia, how many groups do you think would have blown up their own cities?

Without sufficiently destructive technology, it takes a lot more time and effort to completely wipe out large groups of people. Usually some of them survive, and there's a bloody feud for the next 10 generations. It's rare to win sufficiently thoroughly that the group can then commit mass suicide without the culture they attempted genocide against coming back in a generation or two.

There have, of course, been plenty of groups willing to fight to the death. How many of them would have pressed a domesday button if they could?

I actually think most historical groups wanted to vanquish the enemy, but not destroy either themselves or the environment to the point at which it's no longer livable. This is one of the interesting things that shifts to the foreground when thinking about agents in the context of existential risks. As for people fighting to the death, often this was done for the sake of group survival, where the group is the relevant unit here. (Thoughts?)

I think my language could have been more precise: it's not merely genocidal, but humanicidal or omnicidal that we're talking about in the context of x-risks. Also, Khmer Rough wasn't suicidal to my knowledge. Am I less right?

(2) is quite different in that it isn't motivated by supernatural eschatologies. Thus, the ideological and psychological profiles of ecoterrorists are quite different than apocalyptic terrorists, which are bound together by certain common worldview-related threads.

What do you think about how the number of potentially dangerous agents change in time?

Great question. I think there are strong reasons for anticipating the total number of apocalyptic terrorists and ecoterrorists to nontrivially increase in the future. I've written two papers on the former, linked below. There's weaker evidence to suggest that environmental instability will exacerbate conflicts in general, and consequently produce more malicious agents with idiosyncratic motives. As for the others -- not sure! I suspect we'll have at least one superintelligence around by the end of the century.

I think that number of agents will also grow as technologies will be more accessible for smaller organisations and even individuals. If a teenager could create dangerous biovirus as simply as he now able to write computer virus to amuse his friends, we are certainly doomed.

If bonobo type civilizations have already been Great Filtered, that suggests helping humans get along better may not be a feasible strategy for subverting the filter ourselves.

For example, if North Korea were to overcome the world’s superpowers through a sudden preemptive attack and implement a one-world government, the result could be an irreversible decline in our quality of life.

I'm not sure how such a scenario would look like or whether it makes sense. A North Korea that has the power to rule other countries would look very different then the one we have.

Natural risk, these folks say 100% prob that universe is cyclic, and a collapse will come.

"They first used Jacobson's formalism of Einstein's general theory of relativity, where the Einstein equation is basically described as thermodynamical equation. They then studied the corrections to the Einstein equations from quantum effects."

" In fact, we are able to analyze the pre Big Bang state of the Universe. Furthermore, the equations imply that the expansion of the Universe will come to a halt and then will immediately be followed by a contracting phase. When the equations are extrapolated beyond the maximum rate of contraction, a cyclic Universe scenario emerges."

Does this hypothesis rule out the simulation argument?

and another for the natural xrisks map odds