Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I’m working on a theory of abstraction suitable as a foundation for embedded agency and specifically multi-level world models. I want to use real-world examples to build a fast feedback loop for theory development, so a natural first step is to build a starting list of examples which capture various relevant aspects of the problem.

These are mainly focused on causal abstraction, in which both the concrete and abstract model are causal DAGs with some natural correspondence between counterfactuals on the two. (There are some exceptions, though.) The list isn’t very long; I’ve chosen a handful of representative examples which cover qualitatively different aspects of the general problem.

I’ve grouped the examples by symmetry class:

  • Finite DAGs without any symmetries, or at least no symmetries which matter for our purposes
  • Plate symmetry (as in “plate notation”), in which there are a number of conditionally IID components
  • Time symmetry, in which the DAG (or some part of it) consists of one repeated subcomponent connected in a straight line (i.e. a Markov chain structure)

Note that many of the abstractions below abstract from one symmetry class to another - for example, MCMC abstracts a concrete time-symmetric model into an abstract plate-symmetric model.

I’m interested to hear more examples, especially examples which emphasize qualitative features which are absent from any of the examples here. Examples in which other symmetry classes play an important role are of particular interest, as well as examples with agenty behavior which we know how to formalize without too much mess.

Finite DAGs: Examples from Electrical Circuits

Electrical engineers rely heavily on nested layers of abstraction, of exactly the sort I’m interested in (i.e. multi-level models of the physical world). Additionally, causal models are a natural fit for digital circuits. These properties make electrical circuits ideal starting points. They’re a great conceptually-simple use case.

A few of the major abstraction layers, from lowest to highest:

  • Fields: the most concrete-level model used in EE
  • Lumped circuit abstraction: approximating the system as discrete “wires” with constant voltage connecting “circuit elements” with various voltage-current relationships and internal state.
  • Digital abstraction: bucket voltages into high and low.
  • Logic abstraction: replace various subcircuits with logic gates, and multiple “wires” with logical connections.
  • Arithmetic abstraction: replace a logic circuit with an arithmetic circuit
  • Floating point & modular arithmetic: throw out least-significant bits vs throw out most-significant bits
  • Software-level abstractions, e.g. IP -> TCP -> HTTP

Note that real circuits usually do contain some repeated sub-components, but the symmetries in these DAGs aren’t particularly relevant to our purposes, so we’ll mostly ignore them.

Parallel to all this, somewhere along the way we usually abstract out the low-level continuous time-dependence, and adopt an abstract model of instantaneous input-output circuits coupled to clocked storage units (i.e. flip-flops/registers). We’ll include that abstraction separately in the time symmetry section; the levels from lumped circuit through floating point/modular arithmetic can all be specialized to memoryless input-output circuits for simplicity.

Plate Symmetry: Statistical Toy Models

This is the simplest nontrivial symmetry class. The main new qualitative phenomena I see in this class are:

  • Nodes which attempt to estimate the value of other nodes, i.e. embedded maps/embedded reasoners. Technically we can have these in finite DAGs too, but they’re most natural to first consider in models with plate-symmetry, since that’s where traditional statistics operates.
  • Two types of counterfactuals on symmetric components: those which act on only one component, and those which act symmetrically on all.
  • The possibility that an embedded reasoner (i.e. statistical method) can leverage knowledge of the symmetry.

The use of sufficient statistics is a particularly simple example in this class, and adding the calculation of sufficient statistics as an explicit node in the DAG gives us the simplest embedded map. This is the easiest model I’ve used to ask questions like “when can we use the map in place of the territory?” - i.e. questions about abstractions embedded in the DAG itself.

Another example of interest in this class is an embedded reasoner which attempts to deduce model structure by leveraging symmetry. In particular, this introduces the possibility that a node in the DAG could detect (some) counterfactual modifications of the DAG - i.e. notice when it is in a counterfactual query.

Time Symmetry: Equilibrium -> Causality

This is the main symmetry class of interest at the level of physics for most systems, so there’s a lot of examples. Most of them involve some kind of equilibrium abstraction: the concrete model is a DAG over time, while the abstract model captures long-run behavior with time removed.

The simplest example is circuit equilibrium, which we mentioned earlier. At the physical level, the behavior of electrical circuits is DAG-shaped only when viewed over time. Yet, in many applications, there are “inputs” and “outputs” and the equilibrium state of the electrical circuit implements a DAG of some sort. Where does the abstract causal structure come from? This problem is also very similar to causality arising in equilibrium in other areas, e.g. biochemical signalling circuits in cells, or markets/supply chains in which certain goods have very high/very low price elasticity.

The next simplest example is timescale separation, in which a part of the system equilibrates much faster than the rest. A couple examples in this class:

  • Fast equilibrium approximations in chemical kinetics (leading to an abstract causal model in which production & removal rates are parents of equilibrium levels)
  • Alternate updating of fast equilibrium & slow dynamics, e.g. flip-flops/registers paired with fast memoryless input-output circuits in digital electronics.

MCMC is a particularly interesting example. The baby version of this example is the independence of widely-time-separated samples from a markov chain; that’s a simple prototypical example of abstracting time-symmetry into plate-symmetry. But MCMC adds DAG structure within the plate, in a way which does not directly mirror the DAG structure of the concrete model (although it does mirror the undirected structure). It also involves probability calculations in each (concrete) node, which is a hint that an embedded map is present in the system.

Of course, looking at abstractions of time-symmetric systems, we can’t omit feedback control. Despite loopy behavior on the concrete level, at the abstract level we can view the controller target point as causing system limiting behavior - and this abstract view will correctly handle many counterfactuals. In this case, the structure of the abstract equilibrium model might not match the concrete-level structure at all. Based on the good regulator theorem, this is another case where embedded maps are likely to be involved.

Finally, one particularly difficult example: the derivation of the Navier-Stokes equations from molecular dynamics. The main qualitative difference from the earlier examples (at least that I know of) is the importance of an ontology shift: a move from particles to fields of delta functions, from Hamiltonian particle dynamics to Vlasov/Boltzmann equations. Without that shift, our DAG structure shifts over time - because interactions are spatially organized, particles interact with different particles depending on where they are. (Note that deriving Navier-Stokes from particle dynamics is arguably an open problem, depending on what exactly we count as a “derivation”, so there may be other interesting aspects to this example as well. Or possibly not - calculation difficulties, rather than fundamental/conceptual difficulties, seem to be considered the main blockade to a derivation.)

New Comment
7 comments, sorted by Click to highlight new comments since:

There's some recent work in the statistics literature exploring similar ideas. I don't know if you're aware of this, or if it's really relevant to what you're doing (I haven't thought a lot about the comparisons yet), but here are some papers.

It is indeed relevant, I'll probably have a review of the Beckers & Halpern paper at some point (as well as their more recent extension). I'm working on essentially the same problem as them. Also thanks for the link to the Chalukpa-Perona-Eberhardt paper, I hadn't seen that one yet.

Yeah, but writing a sequence seems more fun than doing a literature review.

A tangent:

It sounds like there's some close ties to logical inductors here, both in terms of the flavor of the problem, and some difficulties I expect in translating theory into practice.

A logical inductor is kinda like an approximation. But it's more accurate to call it lots and lots of approximations - it tries to keep track of every single approximation within some large class, which is essential to the proof that it only does finitely worse than any approximation within that class.

A hierarchical model doesn't naturally fall out of such a mixture, it seems. If you pose a general problem, you might just get a general solution. You could try to encourage specialized solutions by somehow ensuring that the problem has several different scales of interest, and sharply limit storage space so that the approximation can't afford special cases that are too similar. But even then I think there's a high probability that the best solution (according to something that is as theoretically convenient as logical inductors) would be alien - something humans wouldn't pick out as the laws of physics in a million tries.

Somewhat related to the electrical circuits example, there might be something similar in software engineering, with levels being something like (depending on the programming paradigm):

  • CPU instructions
  • byte code or op code or assembly
  • AST
  • programming language instructions
  • statements
  • functions
  • modules and classes
  • patterns and DSLs
  • processes
  • applications/products

Yes definitely. I've omitted examples from software and math because there's no "fuzziness" to it; that kind of abstraction is already better-understood than the more probabilistically-flavored use-cases I'm aiming for. But the theory should still apply to those cases, as the limiting case where probabilities are 0 or 1, so they're useful as a sanity check.

I do want to note that probabilities 0 and 1 only correspond to no fuzziness if we assume a finite set. If we don't assume a finite set, then it's easy to cook up examples where probabilities are 0 or 1, but they aren't equivalent to either nothing or everything, and thus probabilities 0 or 1 can still introduce fuzziness.