LESSWRONGDeepMind Alignment Team on Threat Models and Plans
LW

DeepMind Alignment Team on Threat Models and Plans

A collection of posts presenting our understanding of and opinions on alignment threat models and plans.

150DeepMind is hiring for the Scalable Alignment and Alignment Teams

Rohin Shah, Geoffrey Irving

2y

34

376DeepMind alignment team opinions on AGI ruin arguments

2y

37

123Will Capabilities Generalise More?

2y

39

127Clarifying AI X-risk

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt

2y

24

75Threat Model Literature Review

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt

2y

4

85Refining the Sharp Left Turn threat model, part 1: claims and mechanisms

Vika, Vikrant Varma, Ramana Kumar, Mary Phuong

2y

4

39Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika, Vikrant Varma, Ramana Kumar, Rohin Shah

1y

9

86Categorizing failures as “outer” or “inner” misalignment is often confused

1y

21

43Definitions of “objective” should be Probable and Predictive

1y

27

56Power-seeking can be probable and predictive for trained agents

1y

22

53Paradigms of AI alignment: components and enablers

2y

4

128[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy

Vika, Rohin Shah

1y

13