LESSWRONG
LW

DeepMind Alignment Team on Threat Models and Plans

Nov 25, 2022 by Vika

A collection of posts presenting our understanding of and opinions on alignment threat models and plans.

150DeepMind is hiring for the Scalable Alignment and Alignment Teams
Ω
Rohin Shah, Geoffrey Irving
3y
Ω
34
396DeepMind alignment team opinions on AGI ruin arguments
Ω
Vika
3y
Ω
37
133Will Capabilities Generalise More?
Ω
Ramana Kumar
3y
Ω
39
127Clarifying AI X-risk
Ω
zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt
3y
Ω
24
78Threat Model Literature Review
Ω
zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt
3y
Ω
4
86Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Ω
Vika, Vikrant Varma, Ramana Kumar, Mary Phuong
3y
Ω
4
39Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
Ω
Vika, Vikrant Varma, Ramana Kumar, Rohin Shah
3y
Ω
9
93Categorizing failures as “outer” or “inner” misalignment is often confused
Ω
Rohin Shah
2y
Ω
21
43Definitions of “objective” should be Probable and Predictive
Ω
Rohin Shah
2y
Ω
27
56Power-seeking can be probable and predictive for trained agents
Ω
Vika, janos
2y
Ω
22
53Paradigms of AI alignment: components and enablers
Ω
Vika
3y
Ω
4
128[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy
Ω
Vika, Rohin Shah
2y
Ω
13