Link to the Question

I haven't gotten an answer on this yet and I set up a bounty; I figured I'd link it here too in case any stats/physics people care to take a crack at it.

New to LessWrong?

New Comment
17 comments, sorted by Click to highlight new comments since: Today at 3:00 AM

I've just skimmed Shalizi's paper, so I might be wrong, but it seems to me his argument can be summarized as follows:

If we suppose that entropy is a measure of subjective uncertainty, then it would only increase if the subject lost information about the state of the system as it evolves. If the dynamical laws governing the microscopic evolution of the system are information-preserving, then this loss of information can only come from the way in which the subject updates his/her beliefs about the system's state. But if the subject updates by simply conditionalizing on the system's new macroscopic state, then this cannot happen. Bayesian conditionalization can only add information; it cannot subtract information. So, generically, updating one's beliefs about the system by conditionalization will lead to decrease in uncertainty about the system and therefore a decrease in the system's entropy.

I don't think points (1) and (3) in Eliezer's comment are an adequate response to this argument. Point (1) says that when the observer measures the system in order to conditionalize, the entropy of the observer's memory registers increases, which I guess is supposed to compensate for the decrease in system entropy induced by measurement. But this is a non-response. When we do statistical mechanics, we are not usually interested in the entropy of the system plus the observer; we are just interested in the entropy of the system, and it is this entropy that is observed to increase. Also, the response seems to beg the question. On what grounds does Eliezer claim that measurement increases the entropy of the observer's memory? Couldn't Shalizi's argument just be re-applied at this level?

Eliezer's point 3 (as far as I can make sens of it) is that in a quantum universe, from a within-a-branch perspective, the system evolution will not be unitary (and therefore not information-preserving) because the system will have decohered. This is the same point jimrandomh makes here. This is fair enough, but I don't think the Bayesian should be happy attributing entropy increase solely to quantum world-splitting. Statistical mechanics originated with the assumption that the underlying laws are classical, and in the majority of applications this assumption is retained for computational convenience. If the Bayesian position amounts to a rejection of a majority of the work done in statistical mechanics, it seems a pretty big bullet to bite.

Eliezer's point 2 is ultimately where I think the action's at. We don't update statistical distributions simply by conditionalization. Every statistical mechanics text points out that there is a coarse-graining step. When we update our distribution, we coarse-grain over the fine details of the distribution, "smoothing" it out. It is this step that accounts for entropy increase. Now Shalizi's response is that if you are a Bayesian then adding this non-Bayesian step is epistemically incoherent. One way to respond to this is as Eliezer does: Yup, none of us are perfect Bayesians. We are not even close to logically omniscient, so we are doomed to incoherence.

I think there's another response, which is that the best way to think about the probability distributions in statistical mechanics is not as accurate representations of our degrees of belief. The distributions are constructed to remove distinctions between microscopic states that are irrelevant to our macroscopic interactions with the system. Suppose I pour a blob of milk into a cup of coffee on the right side of the cup and then stir. Eventually the milk will be completely mixed with the coffee. If I had poured the blob on the left side of the cup, the milk would also eventually have ended up in a mixed state. Now, technically, my state of knowledge about the microstate of the mixed cup is different in these two cases. In the first case I know that the microstate must be one that evolves from the milk being poured on the right. In the second case I know it must be one that evolves from the milk being poured on the left. If the dynamics of the cup are information-preserving, then these are disjoint subsets of phase space. If I was updating as a Bayesian, the distributions would be totally different from one another.

But the thing is, the original position of the blob of milk makes no difference to my practical ability to interact with the milk and coffee system now that the milk is mixed. I might remember this original position, but I cannot now use that information to extract work from the system. My causal capacities are not sufficiently fine-grained to allow me to do that. So the information is irrelevant to how I now treat the system, from a thermodynamic point of view. To conserve computational resources, I might as well pick a distribution that ignores this information. That distribution will not be the distribution that best represents my knowledge of the system, but it will be the distribution that most effectively allows me to plan interactions with the system.

So I guess ultimately I agree with Shalizi. Thinking of thermodynamic entropy as the same thing as subjective uncertainty is wrong. This doesn't mean it doesn't have a lot to do with subjective uncertainty, though, since our uncertainty about systems is a very important constraint on our ability to interact with them.

[-][anonymous]12y00

Can you articulate how your response interfaces with this answer on Cross-Validated?

Two things to say here:

(1) The view articulated in that answer, that the Second Law only applies to systems that are genuinely closed, would render the Law empirically useless. There are no systems of this sort, except for the entire universe. But we appeal to the Second Law all the time to account for the time-directedness of systems that aren't completely closed (such as ice melting in a glass of water, or gas spreading through a room). We're really working with an approximate sense of closure, one that allows us to describe reasonably insulated systems as closed (with the denotation of "reasonably" depending on context), even though technically they are exchanging some amount of energy with their environments. If we go by the standards in that post, then yes, no system we observe would be governed by the Second Law. But by the same token, the "system plus observer" supersystem wouldn't be governed by the Second Law either, since this supersystem isn't closed. So then I don't see the point of defending the Second Law by including the observer in the system.

(2) The "begging the question" charge I raised in my post is not merely hypothetical. Shalizi is genuinely skeptical of Landauer's principle, the claim that information erasure must have an entropic cost. So invoking Landauer's principle won't fly against him. I think the right response to the sort of problems he raises with the principle (best captured in the John Norton paper linked in his post) is a view of the sort I recommend above. I'd probably need to say a lot more to make this obvious, but I won't unless you're specifically interested.

ETA: Also worth noting: All competent defenses of Landauer's principle that I have read assume that the observer is governed by the Second Law. The usual argument involves pointing out that erasure involves a reduction of the information theoretic entropy of the data stored by the observer. Since the Second Law holds, this reduction of entropy must be compensated by an increase of entropy in the non-information-bearing degrees of freedom, which usually amounts to the observer releasing heat into the environment. But if we go by the reasoning in the answer to which you link, we have no warrant for assuming the observer is governed by the Second Law unless the observer counts as a genuinely closed system. Of course, no actual observer would qualify. So the poster's own reasoning undermines his appeal to Landauer's principle.

[-][anonymous]12y00

There appears to be a semantic problem with this (I am not a physicist, so please bear with me).

If "the arrow of time" is re-defined to just mean "superficial appearance of decreases in entropy to some observer", then I agree with Shalizi and I also believe the result of his paper is not a 'paradox' and doesn't cast any doubt on validity of Bayesian methods. In local situations, a system might be sufficiently "closed" such that to the observer it looks like the system is spontaneously becoming more complex... that is, the degree of ignorance in the observer's mind might decrease quickly.

But, consistent with the physical laws, somewhere within the observer-system metasystem, that entropy is being accounted for. In order to zoom out and re-apply Shalizi's idea to the meta-system, you have to start talking about some new meta-observer whose states of ignorance are only relevant to the first observer-system metasystem.

So to me, it seems like if your approach accurately describes Shalizi's argument, then all he is doing is redefining "arrow of time" such that he gets the result he wants... but no one has to care about that version of "arrow of time" nor believe that it corresponds to the same "arrow of time" that is discussed in almost all discourse on thermodynamics. And even less should anyone think this is genuine reason to be skeptical of fully Bayesian updating.

There is no re-definition of "arrow of time" going on here. Shalizi is using the phrase in its standard thermodynamic sense, describing the fact that a number of macroscopic processes are thermodynamically irreversible.

Consider a specific example: two boxes of gas initially at different temperatures are brought into contact through a diathermal barrier. I check the temperature of these gases using a thermometer periodically. I observe that over time temperature difference vanishes. The gases spontaneously equilibriate.

What would you say about what's going on here? The standard story is that the thermal equilibriation takes place due to the Second Law of Thermodynamics. Heat transfer from the hotter gas to the colder one leads to entropy increase. From a (Boltzmannian) statistical mechanical perspective, the region of phase space corresponding to both gases having the same temperature is larger than the region of phase space corresponding to them having different temperatures. So a distribution that is uniform over the former region (and vanishes elsewhere) will have a higher entropy than a distribution that is uniform over the latter region. Note that none of this requires any appeal to the entropy associated with the observer. The entropy increase in this case has nothing to do with the observer's memory. It has to do with heat flowing from one box of gas to the other.

Now it seems like the guy you link to objects that we can't say that the Second Law applies to this two-gas system because the system is not completely isolated. But this ignores two things. First, the Second Law has productively been used a huge number of times in the past to describe the behavior of systems exactly like this. Second, by this standard the Second Law does not apply to any system. There is no actual system that is completely isolated, except the universe as a whole.

The thing is, the two-gas system is "isolated enough". There is no significant mechanical work being performed on or by it (discounting the negligible amount of work required to raise the mercury in the thermometer I use), as there is in the case of a refrigerator. Observing a system's state does involve some exchange of energy with it, but it need not involve doing work on the system.

Now, Shalizi's point is that if we are strict Bayesians about the state of the system, then the entropy of the distribution we associate with it will not increase, so we would say that the entropy of the system is decreasing. But this is wrong! The entropy of the system is increasing. Not the entropy of the system+observer combo, the entropy of the system itself. If your approach to statistical mechanics tells you it is not, then you are the one flying in the face of orthodox thermodynamics, not Shalizi.

[-][anonymous]12y00

Now, Shalizi's point is that if we are strict Bayesians about the state of the system, then the entropy of the distribution we associate with it will not increase, so we would say that the entropy of the system is decreasing. But this is wrong! The entropy of the system is increasing. Not the entropy of the system+observer combo, the entropy of the system itself. If your approach to statistical mechanics tells you it is not, then you are the one flying in the face of orthodox thermodynamics, not Shalizi.

This is the part I take issue with. Everything else is fine. The entropy of the distribution that we associate with the system will decrease, at the expense of pumping our ignorance as waste heat into our mind's surrounding environment. The entropy of my beliefs about the system is not the same thing as the entropy of the system and it's not covered under the umbrella of orthodox physics to act like my state of ignorance regarding the state of the system is the same as the state of the system. When I learn things by pumping entropy into my surroundings with (at the very least) my brain's waste heat, that is not at all like observing a backward arrow of time, because everything else around me is running down, reaching thermal equilibrium, even if I am pinching up some local ignorance-removal regarding the state of some different, fixed other system.

On what basis are you making the claim that my brain pumping heat into its environment is equivalent to my brain pumping entropy into my environment? Can you justify this claim using a purely Bayesian approach to entropy? I can't see how that would work.

If this is your standard for entropy increase, then surely it shouldn't matter what you are observing. If you are observing a refrigerator, your brain is pumping just as much heat into the environment as when you are observing spontaneous equilibriation. Yet in the case of the refrigerator we usually say, in contrast to the two-gas system I described, that it is entropy decreasing. But if the basis for calling the two-gas system entropy increasing is the heat output by the observer's brain, why doesn't the refrigerator qualify as entropy increasing as well?

Or are you disagreeing that we would (or should) call a two-gas system exhibiting spontaneous thermal equilibriation an entropy-increasing system? I think I'm not getting your view exactly right.

[-][anonymous]12y20

This is the semantic problem that you dismissed. When I talk about the refrigerator, it's clear that I mean to draw an imaginary boundary around the refrigerator only and pretend for a second that that is all there is anywhere. Then the entropy is decreasing. If I talk about the process by which I acquired that knowledge, then I have to expand my imaginary boundary to include the source of the photons that bounced off the refrigerator, for instance, and the waste heat my brain produced to acquire this knowledge. That process, the acquiring of the knowledge, was entropy increasing even if what it revealed to me was a less entropic distribution over states of the refrigerator.

The refrigerator is the two gas system with a pump attached. Learning anything about either system is an entropy increasing proposition (if the boundary is drawn around me plus the system). As it happens, if you want to draw the boundary to exclude me, then the two-gas-system-without-pump also happens to be entropy increasing, while drawing a boundary around the refrigerator is entropy decreasing.

This seriously is just Maxwell's demon.

As it happens, if you want to draw the boundary to exclude me, then the two-gas-system-without-pump also happens to be entropy increasing...

This is what I'm disputing you can get if you treat entropy as subjective uncertainty, while also assuming that the only way to update subjective uncertainty is Bayesian conditionalization. Perhaps you can explain how the two-gas system turns out to be entropy increasing on that viewpoint if you draw the boundary to exclude the observer. How does the entropy of the probability distribution describing the system increase?

[-][anonymous]12y00

"The entropy of the probability distribution describing the system" only has meaning if there is an observer to actually hold that probability distribution. Since probability is in the mind, there is no fixed external thing that just "is" the probability distribution of the system.

There are two distinct things; one is "the system" and the other is "the probability distribution over states of the system." If you make an idealization and do math just on "the system" then the distributions in those idealizations are entropy increasing (if you exclude any observer or external stuff to the system). That does not correspond to reality (because the system's not truly closed), but is often a useful approximation for describing "the arrow of time."

If you want to talk about "the probability distribution over states of the system" then you must also be including some observer with a mind of some sort, or else the notion of there being a probability distribution (as opposed to just whatever the deterministic eventuality of whatever does in fact occur) doesn't make semantic sense.

So, to speak about the "probability distribution of the system" there has to be a Maxwell's demon sitting there having that distribution in its mind (e.g. some observer), and whatever entity it is that is dissipating waste heat while doing physical processes to update its beliefs must be increasing entropy.

Now I'm thoroughly confused about your position. Here are some claims to which you appear to have committed yourself:

(1) You can only talk about probability distribution over the microstates of a system if you treat that system as a sub-system of some larger system that includes an observer.

(2) Entropy is just a measure of subjective uncertainty, which means it is (presumably) a property of a probability distribution.

(3) You can talk about the entropy of a system without including the observer but this is just an idealization and it does not involve a probability distribution over the microstates of the system.

To me, this third claim is just flat-out in contradiction with the first and second claims. How can you talk about the entropy of something from a stat. mech. point of view without it being a property of a probability distribution? Is there really some completely different concept of entropy that comes into play when you exclude the observer from your analysis?

I will also note that the approach I talked about in my original comment does not deny that probability is in the mind. Probability can be "in the mind" without just being subjective uncertainty. Furthermore, accepting that probability is in the mind does not mean that one cannot attribute probability distributions to systems without explicitly representing the system as a subsystem of a supersystem containing an observer.

[-][anonymous]12y20

I appreciate your patience with me and for the help in getting me to confront my confusions about the topic. Your answer still is unsatisfying to me, and this could totally be my own ignorance at work. However, I cannot understand how your answer is sustainable given the comments at both the Stack Exchange post and at the John Baez link.

I think you've misunderstood me when you articulated the 3 positions listed above, but you've definitely hit upon my confusion so I need to think about it more carefully and do a better job saying what I want to say. I will think on it and write again when I get a chance this weekend.

Again, I do appreciate the patience in helping me understand it.

The linked paper explicitly assumes that

The evolution operator T is invertible.

But if you use QM in the conventional way, then this assumption doesn't hold. Suppose you have a state X1 which can evolve into either X2 or X3 with equal probability. You would say that state X1 evolves into the weighted set [1/2 X2 + 1/2 X3]. Shalizi proves that this set has no more entropy than X1 did.

But we, as observers or as part of that system, only get to look at one of the branches, either X2 or X3. Picking which of those two branches we get to look at adds one bit of new entropy, and this selection is not invertible. This is where the increase in entropy with time comes from. What Shalizi has done, is to use math in which all entropy originates in quantum branching, then forget that quantum branching happens.

Evolving into 1/2 x2 + 1/2 x3 is not a quantum operation that can occur in a closed system (it requires at the very least tracing over an auxiliary qubit).

You got downvoted on stackexchange, which I disagree with - you may wish to point out that your argument is the flip side of the other response: if you fix the closed system requirement, then you find the source of entropy.