Defeat may be irreversibly catastrophic

Algon; steven0461

Context: This is a linkpost for https://aisafety.info/questions/NM3P/9:-Defeat-may-be-irreversibly-catastrophic

This is an article in the new intro to AI safety series from AISafety.info. We'd appreciate any feedback. The most up-to-date version of this article is on our website.

When you imagine a global catastrophe, maybe the kind of event that comes to mind is one that strikes a big blow against human civilization but doesn’t ruin it permanently. Events like pandemics, environmental disasters, and even nuclear wars would cause enormous suffering and millions or billions of deaths, but would probably leave part of human civilization alive to recover. Future generations could still flourish.

An AI takeover catastrophe is not like that. An inherent feature of AI taking over control for its own long-term purposes is that it retains that control permanently. We’d be facing an adversary that, rather than striking a one-off blow, continues to optimize the world for its own ends long-term.

Such a scenario could result in full human extinction, because:

The AI may kill us deliberately, because we’re a threat. For example, we might build a different powerful AI that could compete with it. (In the later stages of an AI takeover, maybe humanity will be not even a threat — but then again, it will then also be extremely easy to kill us.)
The conditions supporting human life are unlikely to persist when there’s a technologically hyper-advanced AI civilization using the world for its own ends. Any of a wide range of changes such a civilization will plausibly make to the world — capturing all the sun’s energy output, changing the composition of the atmosphere, using the Earth’s surface for building materials — would result in our extinction

More generally, the risk of permanent loss of the potential for good outcomes for humanity is called existential risk. Human extinction is the main and most straightforward case. But in a world controlled by misaligned AI, even if humans do stay alive, it’s likely that our future would be much worse than we could have hoped for.

Nor could we expect a misaligned AI itself to attain the type of civilization we could have hoped to build ourselves — it will build big and complicated projects, but values like joy and consciousness are unlikely to be instrumentally useful to those projects. It would be a coincidence if they ended up leading what we might consider good lives without being rooted in human values.

That needn’t mean freezing in what humans currently think they want. Humanity has evolved morally in the past, and may find that some of today’s values are actually bad on reflection. But that reflection process itself is not something that automatically falls out of competent optimization — it’s something that AIs would have to be aligned or used toward.

Existential risk from loss of control is one way in which advanced AI will be a big deal, but there are others, which the next article will discuss.

LESSWRONG
LW

LESSWRONG
LW

5

Defeat may be irreversibly catastrophic

5

5