After Eliezer's post, I got to thinking about how to think about AI destroying the world, and how such a fate might be averted. I claim no special expertise, and an epistemic status of great uncertainty.


Humanity has faced (and so far, survived) one potential apocalypse already: that of nuclear war. I refer specifically to the Cold War, where the United States and Russia came very close to launching nuclear weapons against each other several times.

It occurred to me that, by the beginning of the Cold War, the world had a concrete example of the kind of destruction that nuclear weapons were capable of: the annihilation of two Japanese cities at the end of World War II, namely Hiroshima and Nagasaki.

In other words, while there were likely some unknowns, there was no confusion or ambiguity about what nuclear weapons were capable of, or that they represented an incredibly dangerous technology. While the technology could be and was harnessed for a variety of purposes, no one was under any illusions about the potential downsides.

If they needed a reminder, they had only to look at the Human Shadow of Death.


Nuclear War was (and continues to be) a legitimate threat to human civilization.[1] And yet it hasn't happened yet, and we appear to be past the worst period of risk.

So what lessons might we learn from this, and how might they apply to AI?

The question that motivated this post was simple:

Did the destruction of Hiroshima and Nagasaki make a nuclear war between the United States and Russia more or less likely?

In other words, if World War II had come to an end some other way, and no nuclear bomb had ever been detonated in anger - would the Cold War have been more likely to turn hot?

I believe so, without having researched the topic extensively. In a world with no vivid demonstrations of the horror of nuclear weaponry, how could it not be easier to imagine using them?

How could the examples of Hiroshima and Nagasaki make someone more willing to inflict the same devastation upon other cities, and face that devastation deployed against their own population centers?

I believe, based solely on priors, that the vivid examples of the danger of nuclear warfare helped avert a nuclear apocalypse.


So what about AI?

One of the problems acknowledged by those who seek to align AGI is that the dangers of unaligned intelligences are not salient to politicians, decision-makers, or the populace at large. I see concerns about technological unemployment created by AI, but not concerns about the devastation unaligned intelligences are capable of.

This leads to a question, motivated in part by Eliezer's comment about how lethally difficult alignment is:

 if you can get a powerful AGI that carries out some pivotal superhuman engineering task, with a less than fifty percent change of killing more than one billion people, I'll take it

I'll  explicitly state that I am in no way proposing a strategy or policy we should follow.

But I do want to ask: would an (otherwise survivable) AI disaster increase the probability that we survive the next century and the rise of AGI?

In other words, in possible future worlds where humanity survived the creation of AGIs, would those humans be likely to look back and find a salient event in their history, like Hiroshima and Nagasaki for nuclear war, that led to a general understanding of the dangers of the technology in a way that made an apocalypse less likely?

And if so, what is the least bad form of that disaster? A powerful AI deployed for military purposes?

An AI for options trading that crashes the global financial system?

A strawberry picker that rips people's noses off?

I don't have any answers. But thinking about this, I have picked out a single (dim) silver lining: 

If the coming decades include a disaster involving AI - especially one that makes the dangers of the technology salient to worldwide decision- and policy-makers - then I will update, however small an amount, in the direction that we are in one of the possible worlds where humanity survives.

  1. ^

    Citation needed.

New to LessWrong?

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 6:06 PM

Well, it's not clear that actually dropping the bombs has prevented nuclear holocaust.  We only know it hasn't happened YET.  Causally, it's not even sure whether those bombs were necessary for lasting this long.  The development and testing that led up to having and using those bombs is pretty strong evidence that humans are willing to risk a (small and controversial) possibility that they'd ignite the atmosphere and kill everyone, and we don't have much reason to believe we're a lot more cautious now.  

I think small-scale AI mishaps are more likely to be taken as encouragement (we just have to make it a little bit better) rather than cautionary (it's utterly doomed to destroy us all).  Any non-catastrophic disaster is evidence that disasters are recoverable.  

Common human reasoning just isn't able to deal with tail risks.  I don't suspect that any demonstration will convince most of them otherwise.  Even experts who are worried about it will probably downplay any surmountable failures and the reactions to them as not enough to prove things.

I believe (although I admit it is speculation, and the entirety of the argument is too long to have here) that a concrete example - specifically a vivid and violent example - helps ideas stick in the brain, and Hiroshima and Nagasaki were sufficiently vivid and violent to qualify.

As for nuclear war not having happened yet, you're absolutely correct. I do hold, however, that the period of greatest risk is behind us. Humanity has had seven decades with nuclear weapons to think about what happens when they're used in anger, as opposed to the few years we had in the period directly after WWII.

In other words, a new and dangerous technology seems most dangerous to me the newer it is, because people don't understand the risks associated with it yet. It also takes time for the game-theoretic implications (i.e. MAD) to seep into the public mind.

With AI, we may not get a vivid and violent example of AGI gone wrong (because it killed us all). The technology (AGI specifically) will be brand-new, so we won't have any disasters to point to and we won't have time to adapt culturally to the risks.

Hence why I believe that, without a shocking and horrible incident to point to, people will be far more bullish on AGI than they were on nuclear, even though both technologies are potentially apocalyptic.

Interesting (albeit nerve-wracking) question. 

Here's a related post I recalled from recently, just in case you hadn't seen it or others are interested in reading more on this topic: https://www.lesswrong.com/posts/LzKJMx9jqt3bDujij/yes-ai-research-will-be-substantially-curtailed-if-a-lab

(I don't have a strong opinion/take on the idea right now, need to think more about it.)

Thanks, I missed that somehow.

The post discussed the possibility of an disaster in line with Chernobyl or Three Mile Island; I chose to focus on the conscious use of AI for destruction, a la Hiroshima and Nagasaki.

I'll have to think more about the difference between "industrial accident"/"natural phenomenon" and "purposeful attack" when it comes to thinking about the effect catastrophes have on technology.