x

LESSWRONG
LW

The Causes of Power-seeking and Instrumental Convergence — LessWrong

The Causes of Power-seeking and Instrumental Convergence

Jul 05, 2021 by TurnTrout

Instrumental convergence posits that smart goal-directed agents will tend to take certain actions (eg gain resources, stay alive) in order to achieve their goals. These actions seem to involve taking power from humans. Human disempowerment seems like a key part of how AI might go very, very wrong.

But where does instrumental convergence come from? When does it occur, and how strongly? And what does the math look like?

160Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout, Logan Riggs

6y

39

32Power as Easily Exploitable Opportunities

6y

5

45The Catastrophic Convergence Conjecture

6y

16

52Generalizing POWER to multi-agent games

midco, TurnTrout

5y

16

23MDP models are determined by the agent architecture and the environmental dynamics

5y

34

71Environmental Structure Can Cause Instrumental Convergence

5y

43

20A world in which the alignment problem seems lower-stakes

5y

17

45The More Power At Stake, The Stronger Instrumental Convergence Gets For Optimal Policies

5y

7

45Seeking Power is Convergently Instrumental in a Broad Class of Environments

5y

15

53When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentives

5y

4

86Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

4y

8

68A Certain Formalization of Corrigibility Is VNM-Incoherent

4y

24

35Instrumental Convergence For Realistic Agent Objectives

4y

9

172Parametrically retargetable decision-makers tend to seek power

3y

10