This post was written during Alex Altair's agent foundations fellowship program, funded by LTFF. Thank you Alex Altair, Alfred Harwood, Daniel C for feedback and comments.
Introduction
The selection theorems agenda aims to prove statements of the following form: "agents selected under criteria X has property Y," where Y are things such as world models, general purpose search, modularity, etc. We're going to focus on world models.
But what is the intuition that makes us expect to be able to prove such things in the first place? Why expect world models?
Because: assuming the world is a Causal Bayesian Network with the agent's actions corresponding to the D (decision) node, if its actions can robustly control the U (utility) node despite various "perturbations" in the world, then intuitively it must have learned the causal structure of how U's parents influence U in order to take them into account in its actions.
And the same for the causal structure of how U's parents' parents influence U's parents ... and by induction, it must have further learned the causal structure of the entire world upstream of the utility variable.
This is the intuitive argument that the paper Robust Agents Learn Causal World Model by Jonathan Richens and Tom Everitt formalizes.
Informally, its main theorem can be translated as: if an agent responds to various environment interventions by prescribing policies that overall yield low regret, then it's possible to appropriately query the agent to reconstruct an implicit world model that matches up with the ground truth causal structure.
I will refer to this result as the "Causal Good Regulator Theorem". This sequence Thoughts on the Causal Good Regulator Theorem will do the following:
* [1] a precise statement of the theorem alongside a high-level argument of its proof and discussions of the paper's results in the context of the selection theorem agenda - basically a self-complete distillation of the paper. The rest of the posts in this