Hi everyone, Zershaaneh here!
Earlier this year, 80,000 Hours published an article explaining the risks of power-seeking AI.
This post includes some context, the summary from the article, and the table of contents with links to each section. (I thought this would be easier to navigate than if we just reposted the full article here!)
This is meant to be a detailed, introductory resource for understanding how advanced AI could disempower humanity and what people can do to stop it.
It replaces our 2022 article on the existential risks from AI, which highlighted misaligned power-seeking as our main concern but didn't focus entirely on it.
The original article made a mostly theoretical case for power-seeking risk — but the new one actually draws together recent empirical evidence which suggests AIs might develop goals we wouldn’t like, undermine humanity to achieve them, and avoid detection along the way.
We really want people to:
Preventing future power-seeking AIs from disempowering humanity is one of the most pressing problems of our time to work on. The window for developing effective safeguards may be narrow, and the stakes are extremely high. And we think there are promising research directions and policy approaches that could make the difference between beneficial AI and an existential catastrophe.
In the years since we first encountered these arguments and advised people to work on the problem, the field has changed dramatically. AI has progressed rapidly, we think powerful systems are likely to arrive sooner than we once thought, and the risks are more widely discussed. We think the threat from advanced AI systems remains, and empirical evidence (discussed in this article) has provided some support — though it’s far from definitive — for the concerns about power-seeking AI.
We think this is among the most pressing problems in the world.