Suppose I’m a fish farmer, running an experiment with a new type of feed for my fish. In one particular tank, I have 100 fish, and I measure the weight of each fish. I model the weight of each fish as an independent draw from the same distribution, and I want to estimate the mean of that distribution.

Key point: even if I measure every single one of the 100 fish, I will still have some uncertainty in the distribution-mean. Sample-mean is not distribution-mean. My ability to control the fish’ environment is limited; if I try to re-run the experiment with another tank of fish, I don’t think I’ll actually be drawing from the same distribution (more precisely, I don’t think the “same distribution” model will correctly represent my information anymore). Once I measure the weights of each of the 100 fish, that’s it - there’s no more fish I can measure to refine my estimate of the distribution-mean by looking at the physical world, even in principle. Maybe I could gain some more information with detailed simulations and measurements of tank-parameters, but that would be a different model, with a possibly-different distribution-mean.

The distribution-mean is not fully determined by variables in the physical world.

And this isn’t some weird corner-case! This is one of the simplest, most prototypical use-cases of probability/statistics. It’s in intro classes in high-school. 

Another example: temperature. We often describe temperature, intuitively, as representing the average kinetic energy of molecules, bouncing around microscopically. But if we look at the math, temperature works exactly like the distribution-mean in the fish example: it doesn’t represent the actual average energy of the particles, it represents the mean energy of some model-distribution from which the actual particle-energies are randomly drawn. Even if we measured the exact energy of every particle in a box, we’d still have nonzero (though extremely small) uncertainty in the temperature.

One thing to note here: we’re talking about purely classical uncertainty. This has nothing to do with quantum mechanics or the uncertainty principle. Quantum adds another source of irresolvable uncertainty, but even in the quantum case we will also have irresolvable classical uncertainty.


Clustering provides another lens on the same phenomenon. Consider this clustering problem:

Let’s say that the lower-left cluster contains trees, the upper-right apples, and the lower-right pencils. I want to estimate the distribution-mean of some parameter for the tree-cluster.

If there’s only a limited number of data points, then this has the same inherent uncertainty as before: sample mean is not distribution mean. But even if there’s an infinite number of data points, there’s still some unresolvable uncertainty: there are points which are boundary-cases between the “tree” cluster and the “apple” cluster, and the distribution-mean depends on how we classify those. There is no physical measurement we can make which will perfectly tell us which things are “trees” or “apples”; this distinction exists only in our model, not in the territory. In turn, the tree-distribution-parameters do not perfectly correspond to any physical things in the territory.

My own work on abstraction implicitly claims that this applies to high-level concepts more generally (indeed, Bayesian clustering is equivalent to abstraction-discovery, under this formulation). It still seems likely that a wide variety of cognitive algorithms would discover and use similar clusters - the clusters themselves are “natural” in some sense. But that does not mean that the physical world fully determines the parameters. It’s just like temperature: it’s a natural abstraction, which I’d expect a wide variety of cognitive processes to use in order to model the world, but that doesn’t mean that the numerical value of a temperature is fully determined by the physical world-state. 

Takeaway: the variables in our world-models are not fully determined by features of the physical world, and this is typical, it’s true of most of the variables we use day-to-day.

Our Models Still Work Just Fine

Despite all this, our models still work just fine. Temperature is still coupled to things in the physical world, we can still measure it quite precisely, it still has lots of predictive power, and it’s useful day-to-day. Don't be alarmed; it all adds up to normality.

The variables in our models do not need to correspond to anything in the physical world in order to be predictive and useful. They do need to be coupled to things in the physical world, we need to be able to gain some information about the variables by looking at the physical world and vice versa. But there are variables in our world models which are not fully determined by physical world-state. Even if we knew exactly the state of the entire universe, there would still be uncertainty in some of the variables in our world-models, and that’s fine.


New Comment
16 comments, sorted by Click to highlight new comments since: Today at 5:56 PM

Sample-mean is not distribution-mean.

This is my key-takeaway from this post, thank you for writing it.

This reminds me of the propensity of social scientists to drop inference when studying the entire population, claiming that confidence intervals do not make any sense when we have every single existing data point. But confidence intervals do make sense even then, as the entire observed population isn't equal to the theoretical population. The observed population does not give us exact knowledge about any properties of the data generating mechanism, except in edge cases. 

(Not that confidence intervals are very useful when looking at linear regressions with millions of data points anyway, but make sure to have your justification right.)

The fish-weight example was intuitive for me, but the temperature one wasn't. Slightly reformulating the thoughts in my head:

  1. of course temperature measurement is local
  2. that's what temperature is, I don't care about the many possible distributions, only about the current local sample. That's what's affecting things around me, not some hypothetical distribution that isn't instantiated right now.

Maybe you wanted to make a different point here, and I didn't get it?

One thing I didn't explicitly mention in the post is that the average energy of the sample is a sufficient statistic for the temperature - it summarizes all the information from the sample relevant to the temperature. So in that sense, it is all we care about, and your intuition isn't wrong.

However, just like sample mean is not distribution mean, sample average energy is not temperature. If we actually look at the math, the two are different. Sample average energy summarizes all the relevant information about temperature, but is not itself the temperature.

Of course, if we had perfect information about all the low-level particles, we might not have any need to use temperature to model the system. (In the same way, if we had perfect knowledge of all fish weights, we might not need to explicitly use a distribution to model them, depending on our use-case.)

Therefore, determinism is impossible? You’ve demonstrated quite a neat way of showing that reality is of unbounded complexity whereas the human nervous system is of course finite and as such everything we ‘know’, and everything that can be ‘known’, necessarily is, in some portion, illusory.

Determinism usually refers to a world state being determined by a previous one, not the ability to make prefect maps of the world.

Right, for determinism to work in practice, some method of determining that ‘previous world state’ must be viable. But if there are no viable methods, and if somehow that can be proven, then we can be confident that determinism is impossible, or at the very least, that determinism is a faulty idea.

What you are talking about is prediction.

Determinism also needs to be distinguished from predictability. A universe that unfolds deterministically is a universe that can be predicted by an omniscient being which can both capture a snapshot of all the causally relevant events, and have a perfect knowledge of the laws of physics.

The existence of such a predictor, known as a Laplace's demon is not a prerequisite for the actual existence of determinism, it is just a way of explaining the concept. It is not contradictory ro assert that the universe is deterministic but unpredictable.

It seems that your comment got cut off at the end there.

Well, if we were to know that assertion is unprovable, or undecidable, then we can treat it as any other unprovable assertion. 

I don't see any implications for determinism here, or even for complexity.

It's just a statement that these abstract models (and many others) that we commonly use are not directly extensible into the finer-grained model.

One thing to note is that in both cases, there are alternative abstract models with variables that are fully specified by the finer-grained reality model. They're just less convenient to use.

When you said ’not directly extensible’ I understood that as meaning ‘logistically impossible to perfectly map onto a model communicable to humans’. With the fishes fluctuating in weight, in reality, between and during every observation, and between every batch. So even if, perfect, weight information was obtained somehow, that would only be for that specific Planck second. And then averaging, etc., will always have some error inherently. So every step on the way there is a ’loose coupling’, so that the final product, a mental-model of what we just read, is partially illusory.

Perhaps I am misunderstanding?

Though to me it seems clear, there will always be extra bits of information, of specification, that cannot be captured in any model. Regardless of our further progression in modelling. Whether that’s from an abstract model to a finer-grained model, or from a finer-grained model to a whole universe atomic simulation, or from a whole universe atomic simulation to actual reality.

You are misunderstanding the post. There are no "extra bits of information" hiding anywhere in reality; where the "extra bits of information" are lurking is within the implicit assumptions you made when you constructed your model the way you did.

As long as your model is making use of abstractions--that is, using "summary data" to create and work with a lower-dimensional representation of reality than would be obtained by meticulously tracking every variable of relevance--you are implicitly making a choice about what information you are summarizing and how you are summarizing it.

This choice is forced to some extent, in the sense that there are certain ways of summarizing the data that barely simplify computation at all compared to using the "full" model. But even conditioning on a usefully simplifying (natural) abstraction having been selected, there will still be degrees of freedom remaining, and those degrees of freedom are determined by you (the person doing the summarizing). This is where the "extra information" comes from; it's not because of inherent uncertainty in the physical measurements, but because of an unforced choice that was made between multiple abstract models summarizing the same physical measurements.

Of course, in reality you are also dealing with measurement uncertainty. But that's not what the post is about; the thing described in the post happens even if you somehow manage to get your hands on a set of uncertainty-free measurements, because the moment you pick a particular way to carve up those measurements, you induce a (partially) arbitrary abstraction layer on top of the measurements. As the post itself says:

If there’s only a limited number of data points, then this has the same inherent uncertainty as before: sample mean is not distribution mean. But even if there’s an infinite number of data points, there’s still some unresolvable uncertainty: there are points which are boundary-cases between the “tree” cluster and the “apple” cluster, and the distribution-mean depends on how we classify those. There is no physical measurement we can make which will perfectly tell us which things are “trees” or “apples”; this distinction exists only in our model, not in the territory. In turn, the tree-distribution-parameters do not perfectly correspond to any physical things in the territory.

This implies nothing about determinism, physics, or the nature of reality ("illusory" or otherwise).

Ah, I understand what your getting at now dxu, thanks for taking the time clarify. Yes, there likely are not extra bits of information hiding away somewhere, unless there really are hidden parameters in space-time (as one of the possible resolutions to Bell’s theorem). 

When I said ‘there will always be’ I meant it as ‘any conceivable observer will always encounter an environment with extra bits of information outside of their observational capacity’, and thus beyond any  model or mapping. I can see how it could have been misinterpreted.


In regards to my comment on determinism, that was just some idle speculation which TAG helpfully clarified.


Perhaps it’s our difference in perspective but the very paragraph you quoted in your comment seems to indicate that our perceptive faculties will always contain uncertainties, resulting classification errors, and therefore correspondence mismatch. 

I’m then extrapolating to the consequence that we will then always be subject to ad-hoc adjustments to adapt, as the ambiguity, uncertainties, etc., will have to be translated into concrete actions which are needed in order for us to continue to exist. This then results in an erroneous mental model, or what I term as ‘partially illusory knowledge’. 

It’s a bit of an artistic flair but I make the further jump to consider that since all real objects are in fact constantly fluctuating at the Planck scales, in many different ways, every possible observation must lead to, at best, ’partially illusory knowledge’. Since even if there’s an infinitesimally small variance that still counts as a deviation from ‘completely true knowledge’. Maybe I’m just indulging in word games here.

The way I use "extensibility" here is between two different models of reality, and just means that one can be obtained from the other merely by adding details to it without removing any parts of it. In this case I'm considering two models, both with abstractions such as the idea that "fish" exist as distinct parts of the universe, have definite "weights" that can be "measured", and so on.

One model is more abstract: there is a "population weight distribution" from which fish weights at some particular time are randomly drawn. This distribution has some free parameters, affected by the history of the tank.

One model is more fine-grained: there are a bunch of individual fish, each with their own weights, presumably determined by their own individual life circumstances. The concept of "population weight distribution" does not exist in the finer-grained model at all. There is no "abstract" population apart from the actual population of 100 fish in the tank.

So yes, in that sense the "population mean" variable does not directly represent anything in the physical world (or at least our finer-grained model of it). This does not make it useless. Its presence in the more abstract model allows us to make predictions about other tanks that we have not yet observed, and the finer-grained model does not.

New to LessWrong?