Context: This is a linkpost for https://aisafety.info/questions/NM3P/9:-Defeat-may-be-irreversibly-catastrophic
This is an article in the new intro to AI safety series from AISafety.info. We'd appreciate any feedback. The most up-to-date version of this article is on our website.
When you imagine a global catastrophe, maybe the kind of event that comes to mind is one that strikes a big blow against human civilization but doesn’t ruin it permanently. Events like pandemics, environmental disasters, and even nuclear wars would cause enormous suffering and millions or billions of deaths, but would probably leave part of human civilization alive to recover. Future generations could still flourish.
An AI takeover catastrophe is not like that. An inherent feature of AI taking over control for its own long-term purposes is that it retains that control permanently. We’d be facing an adversary that, rather than striking a one-off blow, continues to optimize the world for its own ends long-term.
Such a scenario could result in full human extinction, because:
More generally, the risk of permanent loss of the potential for good outcomes for humanity is called existential risk. Human extinction is the main and most straightforward case. But in a world controlled by misaligned AI, even if humans do stay alive, it’s likely that our future would be much worse than we could have hoped for.
Nor could we expect a misaligned AI itself to attain the type of civilization we could have hoped to build ourselves — it will build big and complicated projects, but values like joy and consciousness are unlikely to be instrumentally useful to those projects. It would be a coincidence if they ended up leading what we might consider good lives without being rooted in human values.
That needn’t mean freezing in what humans currently think they want. Humanity has evolved morally in the past, and may find that some of today’s values are actually bad on reflection. But that reflection process itself is not something that automatically falls out of competent optimization — it’s something that AIs would have to be aligned or used toward.
Existential risk from loss of control is one way in which advanced AI will be a big deal, but there are others, which the next article will discuss.