Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Author's Notes: This post is fairly technical, with little background and minimal examples; it is not recommended for general consumption. A general understanding of causal models is assumed. This post is probably most useful when read alongside the paper. If your last name is "Beckers" or "Halpern", you might want to skip to the last section.

There’s been a handful of papers in the last few years on abstracting causal models. Beckers and Halpern (B&H) wrote an entire paper on definitions of abstraction on causal models. This post will outline the general framework in which these definitions live, discuss the main two definitions which B&H favor, and wrap up with some discussion of a conjecture from the paper. I'll generally use notation and explanations which I find intuitive, rather than matching the paper on everything.

In general, we’ll follow B&H in progressing from more general to more specific definitions.

General Framework

We have two causal models: one “low-level”, and one “high-level”. There’s a few choices about what sort of “causal model” to use here; the main options are:

  • Structural equations
  • Structural equations with a DAG structure (i.e. no feedback loops)
  • Bayes nets

B&H use the first, presumably because it is the most general. That means that everything here will also apply to the latter two options.

Notation for the causal models:

  • We’ll write for the variables in the high-level model and for variables in the low-level model.
  • We’ll use capital-letter indices to indicate choosing multiple indices at once. For instance, with , would be .
  • We’ll write interventions as . For instance, with , would be equivalent to the three simultaneous interventions . Usually both and will be unspecified, to indicate a generic intervention.

Next, we need some connection between the high-level and low-level model, to capture the intuitive notion of “abstraction”. At its most general, this connection has two pieces:

  • A mapping between values of the variables in the models:
  • A mapping between interventions: . Here determines both and as a function of and .

Note that, for true maximum generality, both and could be nondeterministic. However, we’ll generally ignore that possibility within the context of this post.

Finally, the key piece: the high-level and low-level models should yield the same predictions (in cases where they both make a prediction). Formally:

For the category theorists: this means that we get the same distribution by either (a) performing an intervention on the low-level model and then applying to , or (b) first applying to , then applying the high-level intervention (found by transforming the low-level intervention via ).

The first definition of “abstraction” examined by B&H is basically just this, plus a little wiggle room: they don’t require all possible interventions to be supported, and instead include in the definition a set of supported interventions. This definition isn’t specific to B&H - it’s an obvious starting point for defining abstraction on causal models as broadly as possible. B&H adopt this maximally-general definition from Rubenstein et al, and dub it “exact transformation”.

B&H then go on to argue that this definition is too general for most purposes. I won’t re-hash their arguments and examples here; the examples in the paper are pretty readable if you’re interested. They also introduce one slightly stronger definition which I will skip altogether; it seems to just be cleaning up a few weird cases, without any major conceptual additions.


The main attraction in B&H is their definition of “-abstraction”. The main idea in jumping from the maximally-general framework above to -abstraction is that the function mapping low-level variables to high-level variables induces a choice of mapping between interventions; there’s no need to leave the choice of completely open-ended.

In particular, since by definition, it seems like should also somehow relate to in the interventions and . The obvious condition is . However, the interventions themselves only constrain and at the indices and respectively, whereas may depend on (and determine) the variables at other indices.

One natural condition to impose: each value of consistent with the high-level intervention should correspond to at least one possible value of consistent with the corresponding low-level intervention, and each possible value of consistent with the low-level intervention should produce a value of consistent with the high-level intervention. More formally: if our intervention values are and , then we want equality between sets:

This is the main criterion B&H use to define the “natural” mapping between interventions . (The exact definition given by B&H is a bit dense, so I won’t walk through the whole thing here.)

Armed with a natural transformation between low-level and high-level interventions, the next step is of course to define a notion of abstraction: modulo some relatively minor technical conditions, a -abstraction is an abstraction consistent with our general framework, and for which .

One more natural step: A “strong” -abstraction is one for which all interventions on the high-level model are allowed.

Constructive τ-Abstraction

In practical examples of abstraction, the high-level variables usually don’t all depend on all the low-level variables . Usually, the individual high-level variables can each be calculated from non-overlapping subsets of the variables . In other words: we can choose a partition of the low-level variables and break up such that


Also including all the conditions required for a strong -abstraction, B&H call this a “constructive” -abstraction.

The interesting part: B&H conjecture that, modulo some as-yet-unknown minor technical conditions, any strong -abstraction is constructive.

I think this conjecture is probably wrong. Main problem: constructive -abstraction doesn’t handle ontology shifts.

My go-to example of causal abstraction with an ontology shift is a fluid model (e.g. Navier Stokes) as an abstraction of a particle model with only local interactions (e.g. lots of billiard balls). In this case, we have two representations of the low-level system:

  • A Lagrangian representation, in which we track the position and momentum of each particle
  • An Eulerian representation, in which we track the mass and momentum densities as a function of position

The two are completely equivalent; each contains the same information. Yet they have very different structure:

  • In the Lagrangian representation, each “variable” (i.e. a particle’s mass & momentum at a given time) interacts with all other variables which are nearby in time; we need to check for collisions against every other particle, even those far away in space, since we don’t know ahead of time which will be close by.
  • In the Eulerian representation, each “variable” (i.e. mass & momentum density at a given point in space and time) interacts only with variables which are nearby in both space and time.

In this case, the high-level fluid model is a constructive abstraction of the Eulerian representation, but not of the Lagrangian representation: the high-level model only contains interactions which are local in both time and space.

Conceptually, the problem here is that our graph can have dynamic structure: the values of the variables themselves can determine which other variables they interact with. When that happens, an ontology shift can sometimes make the dynamic structure static, as in the Lagrangian -> Eulerian transformation. But that means that a constructive -abstraction on the static structure will not be a constructive -abstraction on the dynamic structure (since the partition would depend on the variables themselves), even though the two models are equivalent (and therefore presumably both are -abstractions).

This does leave open the possibility of weakening the definition of a constructive -abstraction to allow the partition to depend on . Off the top of my head, I don’t know of a counterexample to the conjecture with that modification made.

New to LessWrong?

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 10:40 AM

Update from almost 3 years in the future: this stream of work has continued developing in a few different directions. Both on the conceptual foundations, and some initial attempts to apply these tools to AI. Two recent works I was especially excited by (and their bibliographies): 'Towards a Grounded Theory of Causation for Embodied AI' (, and here's an excellent talk by the author,, and 'Faithful, Interpretable Model Explanations via Causal Abstraction' (

I don't know if you've seen this, but is a follow-up that generalizes the Beckers and Halpern paper to a notion of approximate abstraction by measuring the non-commutativity of the diagram by using some distance function and taking expectations. I think the most useful notion that the paper introduces is the idea of a probability distribution over the set of allowed interventions. Intuitively, you don't need your abstraction of temperature to behave nicely w.r.t freezing half the room and burning the other half such that the average kinetic energy balances out. Thus you can determine the "approximate commutativeness" of the diagram by fixing a high-level intervention and taking an expectation over the low-level interventions that were likely to map to that high-level intervention.

Also, if you are willing to write up your counter example to the conjecture that Beckers and Halpern make, I am currently researching under Eberhardt and he (and I) would be extremely interested in seeing it. I also initially thought that the conjecture was obviously false, but when I tried to actually construct counter examples, all of them ended up as either not strong abstractions or not recursive (acyclic) causal models.

Turns out the particles -> fluid example doesn't work; it's not a -abstraction (which makes me think the range of applicability of -abstraction is considerably narrower than I first thought).

That said, here's a counterexample which I think works. Variables of the low-level model:

  • follow an arbitrary structural model
  • is a random permutation
  • given by

... where U are iid noise terms. So we have some arbitrary structural model, we scramble the variables, and then we compute a function of each. For the high-level model:

  • follow the same model as in the low-level model
  • given by

... so it's the same as the low-level model, but with the variables unscrambled. The mapping between the two is what you'd expect: maps directly, and uses to unscramble : . Then the interventions are similarly simple:

Note that we can pick any we please for the last intervention, but we do need to pick one - we can't just leave it alone.

I'm pretty sure this checks all the boxes for strong -abstraction. But it isn't a constructive -abstraction, since all of the 's depend on the same low-level variable . In principle, there could still be some other which makes the high-level model a constructive abstraction (B&H's definition only requires that some exist between the two models), but I doubt it.

Let me know if you guys spot a hole in this setup, or see an elegant way to confirm that there isn't some other that magically makes it constructive.

Dear John Wentworth:

I have a doubt regarding the implications of Beckers' paper on abstractions. I am a lawyer by profession so I'm venturing pretty far afield here, and I hope my question will not be too trivial.

Given that every constructive abstraction is also a - abstraction, there must be some surjective function that is compatible with . Hence, for constructive abstractions, must there also be mappings such that, , where is the projection of onto the variables in ? In other words, must there also be a partition of the low-level exogenous variables where each partition is mapped to a distinct high-level variable? I missed in the definition of constructive abstraction..

Thank you!

[This comment is no longer endorsed by its author]Reply