LESSWRONG
LW

2805
Towards Causal Foundations of Safe AGI

Towards Causal Foundations of Safe AGI

Jun 09, 2023 by tom4everitt

This sequence will give our take on how causality underpins many critical aspects of safe AGI, including agency, incentives, misspecification, generalisation, fairness, and corrigibility. We summarise past work and point to open questions. 

By the Causal Incentives Working Group

73Introduction to Towards Causal Foundations of Safe AGI
Ω
tom4everitt, Lewis Hammond, Francis Rhys Ward, RyanCarey, James Fox, mattmacdermott, sbenthall
2y
Ω
6
49Causality: A Brief Introduction
Ω
tom4everitt, Lewis Hammond, Jonathan Richens, Francis Rhys Ward, RyanCarey, sbenthall, James Fox
2y
Ω
18
40Agency from a causal perspective
Ω
tom4everitt, mattmacdermott, James Fox, Francis Rhys Ward, Jonathan Richens
2y
Ω
5
27Incentives from a causal perspective
Ω
tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall, Jonathan Richens
2y
Ω
0
29Reward Hacking from a Causal Perspective
Ω
tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott, RyanCarey
2y
Ω
6