Epistemic status: the first couple sections are intended to be a bog-standard primer on chaos theory. In general, this post mostly sticks close to broadly-accepted ideas; it's intended mainly as background for why one would expect the general ideas of abstraction-as-information-at-a-distance to be true. That said, I’m writing it all from memory, and I am intentionally sweeping some technical details under the rug. If you see a mistake, please leave a comment.
Consider a billiards table:
The particular billiards table we’ll use is one I dug out of the physicists’ supply closet, nestled in between a spherical cow and the edge of an infinite plane. The billiard balls are all frictionless, perfectly spherical, bounce perfectly elastically off of other balls and the edges of the table, etc.
Fun fact about billiard balls: if my aim has a tiny bit of error to it, and I hit a ball at ever-so-slightly the wrong angle, that error will grow exponentially as the balls collide. Picture it like this: we start with an evenly-spaced line of balls on the table.
I try to shoot straight along the line, but the angle is off by a tiny amount, call it .
The ball rolls forward, and hits the next ball in line. The distance by which it’s off is roughly the ball-spacing length multiplied by , i.e. .
Since the first ball hits the second ball off-center, the second ball will also have some error in its angle. We do a little geometry, and find that the angular error in the second ball is roughly , where is the radius of a ball.
Now the second ball rolls into the third. The math is exactly the same as before, except the initial error is now multiplied by a factor . So when the second ball hits the third, the angular error in the third ball will be multiplied again, yielding error . Then the next ball will have angular error/uncertainty . And so forth.
Upshot of all this: in a billiard-ball system, small angular uncertainty grows exponentially with the number of collisions. (In fact, this simplified head-on collision scenario yields the slowest exponential growth; if the balls are hitting at random angles, then the uncertainty grows even faster.)
This is a prototypical example of mathematical chaos: small errors grow exponentially as the system evolves over time. Given even a tiny amount of uncertainty in the initial conditions (or a tiny amount of noise from air molecules, or a tiny amount of noise from an uneven table surface, or …), the uncertainty grows, so we are unable to precisely forecast ball-positions far in the future. If we have any uncertainty, then as we forecast further and further into the future, our predictions will converge to a maximally-uncertain distribution on the state-space (or more precisely, a maxentropic distribution on the phase space). We become maximally uncertain about the system state.
… except for one prediction.
Remember that these are Physicists’ Billiards™, so they’re frictionless and perfectly elastic. No matter how many collisions occur, energy is always conserved. We may have some initial uncertainty about the energy, or there may be some noise from air molecules, etc, but the system’s own dynamics will not amplify that uncertainty the way it does with other uncertainty.
So, while most of our predictions become maxentropic (i.e. maximally uncertain) as time goes on, we can still make reasonably-precise predictions about the system’s energy far into the future.
An Information-Theoretic Point Of View
At first glance, this poses a small puzzle. We have very precise information about the initial conditions - our initial error is very small. The system’s dynamics are deterministic and reversible, so information can’t be lost. Why, then, do our predictions become maximally uncertain?
The key is that the angular error is a real number, and specifying a real number takes an infinite number of bits - e.g. it might be . Even though it’s a small real number, it still has an infinite number of bits. And as the billiards system evolves, bits further and further back in the binary expansion become relevant to the large-scale system behavior. But the bits far back in the binary expansion are exactly the bits about which we have approximately-zero information, so we become maximally uncertain about the system state.
Conversely, our initial information about the large-scale system behavior still tells us a lot about the future state, but most of what it tells us is about bits far back in the binary expansion of the future state variables (i.e. positions and velocities). Another way to put it: initially we have very precise information about the leading-order bits, but near-zero information about the lower-order bits further back. As the system evolves, these mix together. We end up with a lot of information about the leading-order and lower-order bits combined, but very little information about either one individually. (Classic example of how we can have lots of information about two variables combined but little information about either individually: I flip two coins in secret, then tell you that the two outcomes were the same. All the information is about the relationship between the two variables, not about the individual values.) So, even though we have a lot of information about the microscopic system state, our predictions about large-scale behavior (i.e. the leading-order bits) are near-maximally uncertain.
… except, again, for the energy. Our information about the energy does not get mixed up with the lower-order bits, so we can continue to precisely forecast the system’s energy far into the future. We end up maximally uncertain about large-scale system state except for our precise estimate of the energy. (Maximum entropy quantifies this: the distribution most often used in statistical mechanics is maxentropic subject to a constraint on our knowledge of the energy.)
The Abstraction Connection
The basic claim about how-abstraction-works: we have some high-dimensional system, within some larger environment. The system’s variables contain a lot of information. But, most of that information is not relevant “far away” from the system - most of it is “wiped out” by noise in the environment, so only a low-dimensional summary is relevant far away. That low-dimensional summary is the “abstraction” of the high-dimensional system.
Why would we expect this notion of “abstraction” to be relevant to the physical world? Why should all the relevant information from a high-dimensional subsystem (e.g. the molecules comprising a tree or a car) fit into a low-dimensional summary?
The billiards system illustrates one of the main answers.
We have a high-dimensional system - i.e. a large number of “billiard balls” bouncing around on a “pool table”. (Really, this is typically used as an analogy for gas molecules bouncing around in a container.) We ask what information about the system’s state is relevant “far away” - in this case, far in the future. And it turns out that, if we have even just a little bit of uncertainty, the vast majority of the information is “wiped out”. Only the system’s energy is relevant to predicting the system state far in the future. The energy is our low-dimensional summary.
To make this example look like something we’d recognize, we need to add a little more information to the summary: as the balls bounce around, the number of balls also does not change, and the volume of the container - i.e. the size of the pool table - does not change. (Everything we said earlier is still true, we were just treating volume and number of balls as fixed background parameters.) In situations where balls might be added/removed, or where the container might grow/shrink, the system dynamics still does not amplify uncertainty in those quantities; information about them is still relevant to predicting the far-future state. So, three quantities - energy, volume, and number of balls - provide a summary of all the information relevant to forecasting the system state in the far future.
You may recognize these as the “state variables” of an ideal gas. (We could instead swap in equivalent variables which contain the same information, e.g. pressure, temperature and volume). The ideal gas model is an abstraction, and the state variables of the ideal gas are exactly the low-dimensional summary which contains all the information from the high-dimensional system state which is relevant far into the future. Chaos wipes out all the other information.
Key idea from dynamical systems theory: this is how most dynamical systems behave by default. Information about some quantities is conserved, and everything else is wiped out by chaos. So, for most systems, we should expect that all the relevant information from the high-dimensional system state can fit into a low(er)-dimensional summary.
The key question is how much lower dimensional. If we have a mole of variables, and the summary “only” needs a mole minus 10 million, that’s not very helpful. Yet in practice, the low-dimensional summaries seem to be much smaller than that - not necessarily as small as the three state variables of an ideal gas, but still a lot less than a mole of variables. Ultimately, this is a question which needs to be answered empirically.
I don’t want to leave people with too narrow a picture, so let’s talk a bit about other paths to the same underlying concept.
First, chaos isn’t the only path to a similar picture. For instance, a major topic in computational complexity theory is systems which take in random bits, and output pseudorandom bits, such that there’s no polynomial-time (as a function of ) method which can distinguish the pseudorandom bits from truly random variables (assuming ). Conceptually, this has similar consequences to chaos: if we have even just a few unknown bits, then they can “wipe out the information” in other bits - not in an information-theoretic sense, but in the sense that the wiped-out information can no longer be recovered by any polynomial-time computation. As with continuous dynamical systems, most random discrete systems will conserve information about some quantities, and everything else will be wiped out by the pseudorandom noise.
Second, we don’t just need to think about dynamics over time, or about “far away” as “far in the future”. For instance, we could imagine two separate systems whose interactions are mediated by a chaotic system - for instance, two people talking, with the sound waves carried by the (chaotic) system of air molecules between them. Chaos will wipe out most of the information about one system (e.g. motions of individual molecules in one person) before that information reaches the other system. All that gets through will be a low(er)-dimensional summary (e.g. larger-scale sonic vibrations). More generally, we can think about information propagating through many different subsystems, in a whole graph of interactions, and then ask what information is conserved as we move outward in the graph.
Third, if we have non-small amounts of uncertainty about the environment, then that uncertainty can wipe out information about the system even without chaos added into the mix. Simple example: the system is a flipped coin. The environment is another flipped coin. Standing “far away”, I cannot “see” either coin, but someone tells me whether they are equal (i.e. HH or TT) or unequal (HT or TH). If I had better information about the environment, then this would be sufficient to figure out the system’s state. But my uncertainty about the environment also wipes out the information about the system; I’m left maximally uncertain about whether the system’s state is H or T. This is very similar to the “mixing” of information we talked about earlier; as before, a given system is likely to have some quantities which do get “mixed in” with things we don’t know, and some quantities which do not get mixed. The latter serve as the low-dimensional summary.
Recap of Main Points
In a chaotic system, small uncertainties/errors are amplified over time. If there’s even just a tiny amount of uncertainty - whether from uncertain initial conditions or noise from the environment - then the large-scale behavior of the system becomes unpredictable far in the future.
… but not completely unpredictable. Typically, some information is conserved - e.g. the energy in a frictionless physical system. Even if noise from the environment causes some uncertainty in this conserved information, it isn’t “amplified” over time, so we can still make decent predictions about these quantities far into the future.
In terms of abstraction: the conserved information is a low(er) dimensional summary, which contains all the known information about the large-scale system state relevant to large-scale measurements of the system’s state in the far future.