Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

In a lot of examples of abstract causality, the abstract causal structure generally mirrors the ground-level causal structure. Think about fluid flow, for instance. At the (classical) ground level, we have a bunch of particles whose interactions are local in time and space. Then we glom together a bunch of particles, abstract it all into the Navier-Stokes equations, and we see… interactions which are local in time and space. If we picture the whole thing as a causal DAG, the abstract DAG (Navier-Stokes) is basically just a glommed-together version of the ground DAG (particle interactions), with some information thrown out.

But once the system contains embedded maps and controllers, that all goes out the window. Abstract causal structure can be completely different from ground-level causal structure.

This post is just a few examples of the weird stuff which can happen. We’ll run through four examples:

  • A model in which an embedded map impersonates another variable
  • A model in which an embedded controller impersonates another variable
  • A thermostat, as an example of feedback control making a system behave as if it follows a different causal model
  • A counterfactual-detector: a node in a causal model which detects whether a counterfactual is running on the model

Map-Impersonator

First, a simple example with an embedded map. We have some background variables B, which are upstream of an “embedded territory” X. We’ll assume that almost all the variation in X is accounted for by B (although B may be high-dimensional, and the X(B) function may be quite complicated).

A map sees all the background information B, and then tries to estimate the value of X (or at least some summary information about X). If the map node has a sufficiently accurate model of the function X(B), then it can act-as-if it were reading from X directly, rather than just inferring X.

Out-of-the-box, the abstract causal model is not perfect - it will not correctly predict how the map responds to counterfactual changes in X. However, it will make other counterfactual predictions correctly: counterfactual changes in B or the map, as well as correlational queries, will all produce answers matching the answers on the ground model.

We can make the abstract model support counterfactual surgery on X by mapping it to counterfactual surgery on both X and the map in the ground level model. That seems kind of dumb in this example, but it’s more interesting in the context of a larger model, as we’ll see next.

Controller-Impersonator

There’s a few different ways to draw the connection between maps and controllers. One is the Good Regulator Theorem, which says that an ideal controller contains a model of the system under control. Another direction is sensing-control duality, which is less of a theorem (as far as I know) and more of a pattern which shows up in control theory. We’re not doing anything formal enough here to need a particular connection; we’ll just note that maps and controllers tend to travel together and leave it at that.

So: what kind of weirdness shows up when there’s a control system embedded in a causal model?

Here’s an example which resembles the setup in the Good Regulator Theorem, and makes the role of an embedded map apparent. It’s exactly the same as our previous example, except now the embedded map is used to control a new variable Y.

We have some background variables B, which are upstream of X. We’ll assume that almost all the variation in X is accounted for by B (although B may be high-dimensional, and the X(B) function may be quite complicated). If our controller has a sufficiently accurate model of the function X(B), then it can “pretend” to only be reading X - so that the whole system behaves-as-if X were upstream of Y.

Again, counterfactuals are all supported, as long as we define them properly - counterfactual surgery on X in the abstract model corresponds to surgery on both X and the controller in the ground model.

Feedback Controller

One (relatively) simple example with a feedback loop is a thermostat. At the abstract level, I turn a knob, and that causes the room temperature to increase/decrease. At a more concrete level, there’s a feedback loop, with back-and-forth causal arrows.

Even though it doesn’t match the concrete causal structure, the abstract model is in some sense correct: it produces correct predictions over a class of queries. If we turn the knob, it correctly predicts that the (long-run) temperature will change. If we hold the temperature at some other value (by external means), it correctly predicts that the knob will not adjust to match it. And of course, absent any counterfactuals, it correctly predicts that knob settings and temperature measurements observed will correlate tightly.

Counterfactual Detector

Finally, a cool example with a slightly different flavor: let’s build a causal model in which one of the nodes tries to detect whether a counterfactual is being run on the model.

A simple version is actually pretty easy:

There’s some hidden variable X (possibly with complicated internal structure of its own), and a bunch of conditionally IID measurements Y_1 … Y_n. The “detector” node simply looks for outliers: it’s 1 if it detects an outlier, 0 if not.

Interestingly, the outlier detector can also detect many counterfactuals.

In normal operation, assuming the measurement noise isn’t too long-tailed and the detector isn’t too sensitive, there will almost never be any outliers. But if we counterfactually set the value of one or several of the Y’s, then the values we set very likely will be outliers - unless X just happens to match the value we set.

As an extreme example, consider the case with zero noise. X is some value, and all of the Y_i are perfectly accurate measurements of X. The detector simply checks that all the Y’s are equal. But if we counterfactually set one or several Y’s (but not all of them), then that one will probably not be exactly equal to the others.

Thus: the detector can often tell when it is inside a counterfactual query.

Note that this relies on a correct knowledge of the original causal structure. Our “outlier detector” is essentially a very crude structure learning algorithm: it tries to deduce whether or not the structure of the system is the same as the original structure. This is the “embedded map” part; the detector node is an (incomplete) embedded map of the system structure.

Parallels to decision theory are left as an exercise to the reader.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 7:45 PM
Out-of-the-box, the abstract causal model is not perfect - it will not correctly predict how the map responds to counterfactual changes in X.

The graph also doesn't include a person who will fix the controller/thermostat if it breaks/stops working.