LESSWRONG
LW

The Causes of Power-seeking and Instrumental Convergence

Jul 05, 2021 by TurnTrout

Instrumental convergence posits that smart goal-directed agents will tend to take certain actions (eg gain resources, stay alive) in order to achieve their goals. These actions seem to involve taking power from humans. Human disempowerment seems like a key part of how AI might go very, very wrong. 

But where does instrumental convergence come from? When does it occur, and how strongly? And what does the math look like?

162Seeking Power is Often Convergently Instrumental in MDPs
Ω
TurnTrout, Logan Riggs
6y
Ω
39
32Power as Easily Exploitable Opportunities
Ω
TurnTrout
5y
Ω
5
45The Catastrophic Convergence Conjecture
Ω
TurnTrout
5y
Ω
16
52Generalizing POWER to multi-agent games
Ω
midco, TurnTrout
4y
Ω
16
23MDP models are determined by the agent architecture and the environmental dynamics
Ω
TurnTrout
4y
Ω
34
71Environmental Structure Can Cause Instrumental Convergence
Ω
TurnTrout
4y
Ω
43
20A world in which the alignment problem seems lower-stakes
Ω
TurnTrout
4y
Ω
17
45The More Power At Stake, The Stronger Instrumental Convergence Gets For Optimal Policies
Ω
TurnTrout
4y
Ω
7
44Seeking Power is Convergently Instrumental in a Broad Class of Environments
Ω
TurnTrout
4y
Ω
15
53When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentives
Ω
TurnTrout
4y
Ω
4
85Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
Ω
TurnTrout
4y
Ω
8
68A Certain Formalization of Corrigibility Is VNM-Incoherent
Ω
TurnTrout
4y
Ω
24
35Instrumental Convergence For Realistic Agent Objectives
Ω
TurnTrout
3y
Ω
9
172Parametrically retargetable decision-makers tend to seek power
Ω
TurnTrout
2y
Ω
10