Post 4 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction, Post 2: Causality, and Post 3: Agency.
By Tom Everitt, James Fox, Ryan Carey, Matt MacDermott, Sebastian Benthall, and Jon Richens, representing the Causal Incentives Working Group. Thanks also to Toby Shevlane and Aliya Ahmad.
“Show me the incentive, and I’ll show you the outcome” – Charlie Munger
Predicting behaviour is an important question when designing and deploying agentic AI systems. Incentives capture some key forces that shape agent behaviour,[1] which don’t require us to fully understand the internal workings of a system.
This post shows how a causal model of an agent and its environment can reveal what the agent wants to know and what it wants to control, as well as how it will respond to commands and influence its...
Post 3 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction and Post 2: Causality.
By Matt MacDermott, James Fox, Rhys Ward, Jonathan Richens, and Tom Everitt representing the Causal Incentives Working Group. Thanks also to Ryan Carey, Toby Shevlane, and Aliya Ahmad.
The purpose of this post is twofold: to lay the foundation for subsequent posts by exploring what agency means from a causal perspective, and to sketch a research program for a deeper understanding of agency.
Agency is a complex concept that has been studied from multiple perspectives, including social science, philosophy, and AI research. Broadly it refers to a system able to act autonomously. For the purposes of this blog post, we interpret agency as goal-directedness, i.e. acting as if trying to direct the world in some particular...
Yes I can flip two independent coins a finite number of times and get strings that appear to be correlated. But in the asymptotic limit the probability they are the same (or correlated at all) goes to zero. Hence, two causally unrelated things can appear dependent for finite sample sizes. But when we have infinite samples (which is the limit we assume when making statements about probabilities) we get P(a,b) = P(a)P(b).
Thanks for commenting! This is an interesting question and answering it requires digging into some of the subtleties of causality. Unfortunately the time series framing you propose doesnt work because this time series data is not iid (the variable A = "the next number out of program 1" is not iid), while by definition the distributions P(A), P(B) and P(A,B) you are reasoning with are assuming iid. We really have to have iid here, otherwise we are trying to infer correlation from a single sample. By treating non-iid variables as iid we can see correlations ...
In the example of the two programs, we have to be careful with what we mean by statistical correlation v.s. more standard / colloquial use of the term. Im assuming here when you say `the same program running on opposite ends of the universe, and their outputs would be the same’ that you are referring to a deterministic program (else, there would be no guarantee that the outputs were the same). But, if the output of the two programs is deterministic, then there can be no statistical correlation between them. Let A be the outcome of the first program an...
Post 2 of Towards Causal Foundations of Safe AGI, see also Post 1 Introduction.
By Lewis Hammond, Tom Everitt, Jon Richens, Francis Rhys Ward, Ryan Carey, Sebastian Benthall, and James Fox, representing the Causal Incentives Working Group. Thanks also to Alexis Bellot, Toby Shevlane, and Aliya Ahmad.
Causal models are the foundations of our work. In this post, we provide a succinct but accessible explanation of causal models that can handle interventions, counterfactuals, and agents, which will be the building blocks of future posts in the sequence. Basic familiarity with (conditional) probabilities will be assumed.
What does it mean for the rain to cause the grass to become green? Causality is a philosophically intriguing topic that underlies many other concepts of human importance. In particular, many concepts relevant to safe AGI, like...
This is such a good deep dive into our paper, which I will be pointing people to in the future. Thanks for writing it!
Agree that conditioning on the intervention is unnatural for agents. One way around this is to note that adapting to an unknown distributional shift given only sensory inputs Pa_D is strictly harder than adapting to a known distributional shift (given Pa_D and sigma). It follows that any agent capable of adapting given only its sensory inputs must have learned a CWM (footnotes, p6).