Do-Divergence: A Bound for Maxwell's Demon

Assume that your window of observations is limited to some small part of the universe, or perhaps that it's coarse (you only observe some high-level information, not full low-level details). Suppose you either narrow this window further (stop observing some part of the system), or try to predict/control low-level dynamics from the high-level state.

In the former case, the moment you stop observing a part of the system, your observations become consistent with those from a set of timelines differing by what happened in that part of the system (under the constraint of the laws of physics). E. g., if you started with a perfect "snapshot" of the gas molecules' positions and velocities (we're assuming they're billiard balls), as time goes on, there's more and more information leaking from the unobserved environment as they bounce off walls composed of other molecules (whose exact positions and velocities you didn't observe, so you couldn't predict their effects on the gas particles).^[1] You might be able to correctly time when to open/close the door to correctly sort the molecules that started very close to it, but this ability would quickly degrade. Your uncertainty increases; entropy rises.

In the latter case, you already started out unsure regarding which timeline consistent with the given high-level information (e. g., pressure and temperature in the box) you're in. Your uncertainty over the low-level state is already maximized under the high-level constraints.

And Maxwell's demon basically maintains perfect observation of the state of the system you'd otherwise fail to (perfectly) observe.

The exact amount of uncertainty generated (aka the amount of data the demon's observational array should be gathering for the demon to work) depends on the physics of the system you're in (how quickly the set of possible timelines grows; how many low-level states are consistent with the high-level state). But the overall idea should work in all universes and on all levels of abstraction.

Moving from anthropics to embedded agency: the above assumes that you have the correct model of the laws of your abstract environment (its state-transition rules, potentially stochastic ones), and enough computational resources to simulate that environment (at the given level of abstraction).

If you're operating at the lowest level of abstraction of our universe, your uncertainty is "in the territory": this setup should reduce to the quantum-mechanical uncertainty principle, with the unobserved part of the system becoming Schrödinger's cat's box. The anthropics reasoning then reduces to the many-worlds reasoning.^[2]

At higher levels, it's "pseudo-territorial": if we model a coarse agent as genuinely discarding the low-level information it can't integrate into its abstract models, the setup should be structurally equivalent to the lowest-level one.^[3]

In a way, this is trivial. As long as the underlying state-transition rules are stochastic, or are modeled as stochastic – either because they're "genuinely" uncertain as in QM, or because you have imperfect information (up to the th significant digit), or because you're using an abstract model that bakes in imperfect information – as long as it's the case, any mechanism that tries to control which state you end up in would need to "oppose"/"de-stochasticify" the stochastic dynamics.

So: with n molecules, the KL divergence between the two distributions is $n \cdot l o g 2$ , i.e. $n$ bits, as one might intuitively guess.

Right, if you're not making any new observations, your uncertainty would be maximized over the set of the allowed timelines consistent with your past observations and the laws of physics (or the laws of your abstract environment). Since molecules can't disappear/leave the box/etc., there are only $2^{n}$ such timelines.

^{^}
Alternatively, you can model it as having snapshotted the state only up to some $m$ th significant digit of the molecules' positions/velocities, and as it evolves, the knowledge of the digits past the $m$ th one becomes ever more necessary for predicting the dynamics.
^{^}
This can probably be exactly formalized via this approach,^[4] but these margins etc. etc. (Note to John: this is related to the potential (dubious) QM-based line of reasoning I'd mentioned in the write-up I sent you.)
^{^}
Except with the stochastic state-transition mechanism being Markovian instead of non-Markovian, because that information is not actually discarded (we can't use it, but we still observe it, so it still differentiates our exact observations between different timelines) – so there wouldn't be macro-scale quantum effects; see the linked paper.
^{^}
Bullshit check: Scott Aaronson thinks this paper is valid (but not enlightening).

Reply

[-]J Bostock3mo70

Now for the key idea: we’re going to compare the distribution of states achieved by the demon with policy $π$ , to the distribution of states $X$ which would be achieved by the demon if it took the same distribution of actions completely independent of its observations - i.e. if it just blindly tried to sort the molecules without looking at them.

Interesting! I've previously looked at this method as a solid definition of "optimization" (and Utility functions and whatnot) but I never thought of applying it to Maxwell's Demon.

Reply

[-]Johannes C. Mayer3mo40

Why can't the daemon just continuously look at a tiny area around the gate and decide just based on that? A tiny area seems intuitively sufficient for both recognizing that a molecule would go from left to right when opened, and no molecule would go from right to left. This would mean that it doesn't need to know a distribution over molecules at all.

Basically: Why can't the daemon just solve a localised control task.

Reply