New 80,000 Hours problem profile on the risks of power-seeking AI

Hi everyone, Zershaaneh here!

Earlier this year, 80,000 Hours published an article explaining the risks of power-seeking AI.

This post includes some context, the summary from the article, and the table of contents with links to each section. (I thought this would be easier to navigate than if we just reposted the full article here!)

Context

This is meant to be a detailed, introductory resource for understanding how advanced AI could disempower humanity and what people can do to stop it.

It replaces our 2022 article on the existential risks from AI, which highlighted misaligned power-seeking as our main concern but didn't focus entirely on it.

The original article made a mostly theoretical case for power-seeking risk — but the new one actually draws together recent empirical evidence which suggests AIs might develop goals we wouldn’t like, undermine humanity to achieve them, and avoid detection along the way.

What else is new?

A more detailed look at how an existential catastrophe from power-seeking AI could actually unfold
The arguments against thinking this is a pressing issue which currently feel the most salient — and our responses to them
An updated overview of approaches to the problem in both technical safety and governance

Why are we posting this here?

We really want people to:

Know that this resource exists — a step-by-step case for being concerned about existential risks from power-seeking AI, drawing on recent empirical evidence and addressing objections.
Share the article with anyone who wants to understand why people are worried about this, especially if they might be interested in working on these risks and want concrete advice for how to do so.
Tell us what they think through this anonymous survey. We plan to keep this article up to date as one of our most central pieces, so suggestions for ways to improve it would be quite useful (especially if anything seems wrong or misleading).

Summary (from the article)

Preventing future power-seeking AIs from disempowering humanity is one of the most pressing problems of our time to work on. The window for developing effective safeguards may be narrow, and the stakes are extremely high. And we think there are promising research directions and policy approaches that could make the difference between beneficial AI and an existential catastrophe.

In the years since we first encountered these arguments and advised people to work on the problem, the field has changed dramatically. AI has progressed rapidly, we think powerful systems are likely to arrive sooner than we once thought, and the risks are more widely discussed. We think the threat from advanced AI systems remains, and empirical evidence (discussed in this article) has provided some support — though it’s far from definitive — for the concerns about power-seeking AI.

Our overall view

Recommended — highest priority

We think this is among the most pressing problems in the world.

LESSWRONG
LW