LESSWRONG
LW

245
Counterfactual Planning

Counterfactual Planning

Feb 02, 2021 by Koen.Holtman

Counterfactual planning is a design approach for creating a range of safety mechanisms that can be applied to AGI systems. This sequence introduces the graphical notation used in counterfactual planning, and it defines several safety mechanisms.

10Counterfactual Planning in AGI Systems
Ω
Koen.Holtman
5y
Ω
0
6Graphical World Models, Counterfactuals, and Machine Learning Agents
Ω
Koen.Holtman
5y
Ω
2
7Creating AGI Safety Interlocks
Ω
Koen.Holtman
5y
Ω
4
22Disentangling Corrigibility: 2015-2021
Ω
Koen.Holtman
5y
Ω
20
8Safely controlling the AGI agent reward function
Ω
Koen.Holtman
5y
Ω
0