LESSWRONGDeepMind Alignment Team on Threat Models and Plans
LW

DeepMind Alignment Team on Threat Models and Plans

Nov 25, 2022 by Vika

A collection of posts presenting our understanding of and opinions on alignment threat models and plans.

146DeepMind is hiring for the Scalable Alignment and Alignment TeamsΩ
Rohin Shah, Geoffrey Irving
10mo
Ω
35
370DeepMind alignment team opinions on AGI ruin argumentsΩ
Vika
6mo
Ω
36
119Will Capabilities Generalise More?Ω
Ramana Kumar
9mo
Ω
38
107Clarifying AI X-riskΩ
zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt
5mo
Ω
23
66Threat Model Literature ReviewΩ
zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt
5mo
Ω
4
78Refining the Sharp Left Turn threat model, part 1: claims and mechanismsΩ
Vika, Vikrant Varma, Ramana Kumar, Mary Phuong
7mo
Ω
3
39Refining the Sharp Left Turn threat model, part 2: applying alignment techniquesΩ
Vika, Vikrant Varma, Ramana Kumar, Rohin Shah
4mo
Ω
8
82Categorizing failures as “outer” or “inner” misalignment is often confusedΩ
Rohin Shah
3mo
Ω
21
35Definitions of “objective” should be Probable and PredictiveΩ
Rohin Shah
3mo
Ω
27
33Power-seeking can be probable and predictive for trained agentsΩ
Vika, janos
22d
Ω
1
53Paradigms of AI alignment: components and enablersΩ
Vika
10mo
Ω
4
107[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategyΩ
Vika, Rohin Shah
16d
Ω
12