LESSWRONG
LW

1097
AI Control

AI Control

Mar 24, 2024 by Fabien Roger

This is a collection of posts about AI Control, an approach to AI safety that focuses on safety measures aimed at preventing powerful AIs from causing unacceptably bad outcomes even if powerful AIs are misaligned and intentionally try to subvert those safety measures.

These posts are useful to understand the AI Control approach, its upsides, and downsides. They only cover a small fraction of AI safety work relevant to AI control.

278The case for ensuring that powerful AIs are controlled
Ω
ryan_greenblatt, Buck
2y
Ω
73
239AI Control: Improving Safety Despite Intentional Subversion
Ω
Buck, Fabien Roger, ryan_greenblatt, Kshitij Sachan
2y
Ω
24
87Untrusted smart models and trusted dumb models
Ω
Buck
2y
Ω
17
113Catching AIs red-handed
Ω
ryan_greenblatt, Buck
2y
Ω
27
318Would catching your AIs trying to escape convince AI developers to slow down or undeploy?
Ω
Buck
1y
Ω
77
122AI catastrophes and rogue deployments
Ω
Buck
1y
Ω
16
101Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Ω
Buck, ryan_greenblatt
2y
Ω
19
49Auditing failures vs concentrated failures
Ω
ryan_greenblatt, Fabien Roger
2y
Ω
1
42Protocol evaluations: good analogies vs control
Ω
Fabien Roger
2y
Ω
10
70How useful is "AI Control" as a framing on AI X-Risk?
habryka, ryan_greenblatt
2y
4
145Fields that I reference when thinking about AI takeover prevention
Ω
Buck
1y
Ω
16
91New report: Safety Cases for AI
Ω
joshc
1y
Ω
14
49Notes on control evaluations for safety cases
Ω
ryan_greenblatt, Buck, Fabien Roger
2y
Ω
0
51Toy models of AI control for concentrated catastrophe prevention
Ω
Fabien Roger, Buck
2y
Ω
2
45Games for AI Control
Ω
charlie_griffin, Buck
1y
Ω
0