Towards Causal Foundations of Safe AGI

This sequence will give our take on how causality underpins many critical aspects of safe AGI, including agency, incentives, misspecification, generalisation, fairness, and corrigibility. We summarise past work and point to open questions. 

By the Causal Incentives Working Group