LESSWRONGDeepMind Alignment Team on Threat Models and Plans
LW

DeepMind Alignment Team on Threat Models and Plans

Nov 25, 2022 by Vika

A collection of posts presenting our understanding of and opinions on alignment threat models and plans.

147DeepMind is hiring for the Scalable Alignment and Alignment TeamsΩ
Rohin Shah, Geoffrey Irving
1y
Ω
34
375DeepMind alignment team opinions on AGI ruin argumentsΩ
Vika
9mo
Ω
36
120Will Capabilities Generalise More?Ω
Ramana Kumar
1y
Ω
38
114Clarifying AI X-riskΩ
zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt
8mo
Ω
23
70Threat Model Literature ReviewΩ
zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt
8mo
Ω
4
81Refining the Sharp Left Turn threat model, part 1: claims and mechanismsΩ
Vika, Vikrant Varma, Ramana Kumar, Mary Phuong
10mo
Ω
3
40Refining the Sharp Left Turn threat model, part 2: applying alignment techniquesΩ
Vika, Vikrant Varma, Ramana Kumar, Rohin Shah
7mo
Ω
8
85Categorizing failures as “outer” or “inner” misalignment is often confusedΩ
Rohin Shah
5mo
Ω
21
35Definitions of “objective” should be Probable and PredictiveΩ
Rohin Shah
5mo
Ω
27
53Power-seeking can be probable and predictive for trained agentsΩ
Vika, janos
4mo
Ω
21
53Paradigms of AI alignment: components and enablersΩ
Vika
1y
Ω
4
121[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategyΩ
Vika, Rohin Shah
3mo
Ω
12