Lukas Finnveden

AI-enabled coups: a small group could use AI to seize power

We’ve written a new report on the threat of AI-enabled coups. I think this is a very serious risk – comparable in importance to AI takeover but much more neglected. In fact, AI-enabled coups and AI takeover have pretty similar threat models. To see this, here’s a very basic threat model for AI takeover: 1. Humanity develops superhuman AI 2. Superhuman AI is misaligned and power-seeking 3. Superhuman AI seizes power for itself And now here’s a closely analogous threat model for AI-enabled coups: 1. Humanity develops superhuman AI 2. Superhuman AI is controlled by a small group 3. Superhuman AI seizes power for the small group While the report focuses on the risk that someone seizes power over a country, I think that similar dynamics could allow someone to take over the world. In fact, if someone wanted to take over the world, their best strategy might well be to first stage an AI-enabled coup in the United States (or whichever country leads on superhuman AI), and then go from there to world domination. A single person taking over the world would be really bad. I’ve previously argued that it might even be worse than AI takeover. [1] The concrete threat models for AI-enabled coups that we discuss largely translate like-for-like over to the risk of AI takeover.[2] Similarly, there’s a lot of overlap in the mitigations that help with AI-enabled coups and AI takeover risk — e.g. alignment audits to ensure no human has made AI secretly loyal to them, transparency about AI capabilities, monitoring AI activities for suspicious behaviour, and infosecurity to prevent insiders from tampering with training. If the world won't slow down AI development based on AI takeover risk (e.g. because there’s isn’t strong evidence for misalignment), then advocating for a slow down based on the risk of AI-enabled coups might be more convincing and achieve many of the same goals. I really want to encourage readers — especially those at labs or governments — to do something

137Apr 16, 2025

Lukas Finnveden

Message

Previously "Lanrian" on here. Research analyst at Redwood Research. Views are my own.

Feel free to DM me, email me at [my last name].[my first name]@gmail.com or send something anonymously to https://www.admonymous.co/lukas-finnveden

3586

554

565

Lukas Finnveden's Shortform

Feb 57

Notes on cooperating with unaligned AIs

These are some research notes on whether we could reduce AI takeover risk by cooperating with unaligned AIs. I think the best and most readable public writing on this topic is “Making deals with early schemers”, so if you haven't read that post, I recommend starting there. These notes were...

Aug 24, 202560

Being honest with AIs

In the future, we might accidentally create AIs with ambitious goals that are misaligned with ours. But just because we don’t have the same goals doesn’t mean we need to be in conflict. We could also cooperate with each other and pursue mutually beneficial deals. For previous discussion of this,...

Aug 21, 202577

AI-enabled coups: a small group could use AI to seize power

Apr 16, 2025137

What's important in "AI for epistemics"?

Summary This post gives my personal take on “AI for epistemics” and how important it might be to work on. Some background context: * AI capabilities are advancing rapidly and I think it’s important to think ahead and prepare for the possible development of AI that could automate almost all...

Aug 24, 202450

Project ideas: Backup plans & Cooperative AI

This is part of a series of lists of projects. The unifying theme is that the projects are not targeted at solving alignment or engineered pandemics but still targeted at worlds where transformative AI is coming in the next 10 years or so. See here for the introductory post. In...

Jan 8, 202418

Project ideas: Sentience and rights of digital minds

Jan 7, 202421

Load More (7/22)

LESSWRONG
LW

LESSWRONG
LW

Lukas Finnveden

Lukas Finnveden

Lukas Finnveden

AI-enabled coups: a small group could use AI to seize power

Extrapolating GPT-N performance

PaLM in "Extrapolating GPT-N performance"

Truthful AI: Developing and governing AI that does not lie

Lukas Finnveden

Lukas Finnveden's Shortform

Notes on cooperating with unaligned AIs

Being honest with AIs

AI-enabled coups: a small group could use AI to seize power

What's important in "AI for epistemics"?

Project ideas: Backup plans & Cooperative AI

Project ideas: Sentience and rights of digital minds

AI-enabled coups: a small group could use AI to seize power

Extrapolating GPT-N performance

PaLM in "Extrapolating GPT-N performance"

Truthful AI: Developing and governing AI that does not lie

Lukas Finnveden's Shortform

Notes on cooperating with unaligned AIs

Being honest with AIs

AI-enabled coups: a small group could use AI to seize power

What's important in "AI for epistemics"?

Project ideas: Backup plans & Cooperative AI

Project ideas: Sentience and rights of digital minds