LESSWRONG
LW

Counterfactual Planning

Feb 02, 2021 by Koen.Holtman

Counterfactual planning is a design approach for creating a range of safety mechanisms that can be applied to AGI systems. This sequence introduces the graphical notation used in counterfactual planning, and it defines several safety mechanisms.

10Counterfactual Planning in AGI Systems
Ω
Koen.Holtman
4y
Ω
0
6Graphical World Models, Counterfactuals, and Machine Learning Agents
Ω
Koen.Holtman
4y
Ω
2
7Creating AGI Safety Interlocks
Ω
Koen.Holtman
4y
Ω
4
22Disentangling Corrigibility: 2015-2021
Ω
Koen.Holtman
4y
Ω
20
8Safely controlling the AGI agent reward function
Ω
Koen.Holtman
4y
Ω
0