LESSWRONGTowards Causal Foundations of Safe AGI
LW

Towards Causal Foundations of Safe AGI

Jun 09, 2023 by tom4everitt

This sequence will give our take on how causality underpins many critical aspects of safe AGI, including agency, incentives, misspecification, generalisation, fairness, and corrigibility. We summarise past work and point to open questions. 

By the Causal Incentives Working Group

67Introduction to Towards Causal Foundations of Safe AGI
Ω
tom4everitt, Lewis Hammond, Francis Rhys Ward, RyanCarey, James Fox, mattmacdermott, sbenthall
6mo
Ω
6
48Causality: A Brief Introduction
Ω
tom4everitt, Lewis Hammond, Jonathan Richens, Francis Rhys Ward, RyanCarey, sbenthall, James Fox
5mo
Ω
18
38Agency from a causal perspective
Ω
tom4everitt, mattmacdermott, James Fox, Francis Rhys Ward, Jonathan Richens
5mo
Ω
5
28Incentives from a causal perspective
Ω
tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall, Jonathan Richens
5mo
Ω
0
29Reward Hacking from a Causal Perspective
Ω
tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott, RyanCarey
4mo
Ω
4