The Second Law of Thermodynamics, and Engines of Cognition

Followup toSuperexponential Conceptspace, and Simple Words

The first law of thermodynamics, better known as Conservation of Energy, says that you can't create energy from nothing: it prohibits perpetual motion machines of the first type, which run and run indefinitely without consuming fuel or any other resource.  According to our modern view of physics, energy is conserved in each individual interaction of particles.  By mathematical induction, we see that no matter how large an assemblage of particles may be, it cannot produce energy from nothing - not without violating what we presently believe to be the laws of physics.

This is why the US Patent Office will summarily reject your amazingly clever proposal for an assemblage of wheels and gears that cause one spring to wind up another as the first runs down, and so continue to do work forever, according to your calculations.  There's a fully general proof that at least one wheel must violate (our standard model of) the laws of physics for this to happen.  So unless you can explain how one wheel violates the laws of physics, the assembly of wheels can't do it either.

A similar argument applies to a "reactionless drive", a propulsion system that violates Conservation of Momentum.  In standard physics, momentum is conserved for all individual particles and their interactions; by mathematical induction, momentum is conserved for physical systems whatever their size.  If you can visualize two particles knocking into each other and always coming out with the same total momentum that they started with, then you can see how scaling it up from particles to a gigantic complicated collection of gears won't change anything.  Even if there's a trillion quadrillion atoms involved, 0 + 0 + ... + 0 = 0.

But Conservation of Energy, as such, cannot prohibit converting heat into work.  You can, in fact, build a sealed box that converts ice cubes and stored electricity into warm water.  It isn't even difficult.  Energy cannot be created or destroyed:  The net change in energy, from transforming (ice cubes + electricity) to (warm water), must be 0.  So it couldn't violate Conservation of Energy, as such, if you did it the other way around...

Perpetual motion machines of the second type, which convert warm water into electrical current and ice cubes, are prohibited by the Second Law of Thermodynamics.

The Second Law is a bit harder to understand, as it is essentially Bayesian in nature.

Yes, really.

The essential physical law underlying the Second Law of Thermodynamics is a theorem which can be proven within the standard model of physics:  In the development over time of any closed system, phase space volume is conserved.

Let's say you're holding a ball high above the ground.  We can describe this state of affairs as a point in a multidimensional space, at least one of whose dimensions is "height of ball above the ground".  Then, when you drop the ball, it moves, and so does the dimensionless point in phase space that describes the entire system that includes you and the ball.  "Phase space", in physics-speak, means that there are dimensions for the momentum of the particles, not just their position - i.e., a system of 2 particles would have 12 dimensions, 3 dimensions for each particle's position, and 3 dimensions for each particle's momentum.

If you had a multidimensional space, each of whose dimensions described the position of a gear in a huge assemblage of gears, then as you turned the gears a single point would swoop and dart around in a rather high-dimensional phase space.  Which is to say, just as you can view a great big complex machine as a single point in a very-high-dimensional space, so too, you can view the laws of physics describing the behavior of this machine over time, as describing the trajectory of its point through the phase space.

The Second Law of Thermodynamics is a consequence of a theorem which can be proven in the standard model of physics:  If you take a volume of phase space, and develop it forward in time using standard physics, the total volume of the phase space is conserved.

For example:

Let there be two systems, X and Y: where X has 8 possible states, Y has 4 possible states, and the joint system (X,Y) has 32 possible states.

The development of the joint system over time can be described as a rule that maps initial points onto future points.  For example, the system could start out in X7Y2, then develop (under some set of physical laws) into the state X3Y3 a minute later.  Which is to say: if X started in 7, and Y started in 2, and we watched it for 1 minute, we would see X go to 3 and Y go to 3.  Such are the laws of physics.

Next, let's carve out a subspace S of the joint system state.  S will be the subspace bounded by X being in state 1 and Y being in states 1-4.  So the total volume of S is 4 states.

And let's suppose that, under the laws of physics governing (X,Y) the states initially in S behave as follows:

X1Y1 -> X2Y1
X1Y2 -> X4Y1
X1Y3 -> X6Y1
X1Y4 -> X8Y1

That, in a nutshell, is how a refrigerator works.

The X subsystem began in a narrow region of state space - the single state 1, in fact - and Y began distributed over a wider region of space, states 1-4.  By interacting with each other, Y went into a narrow region, and X ended up in a wide region; but the total phase space volume was conserved.  4 initial states mapped to 4 end states.

Clearly, so long as total phase space volume is conserved by physics over time, you can't squeeze Y harder than X expands, or vice versa - for every subsystem you squeeze into a narrower region of state space, some other subsystem has to expand into a wider region of state space.

Now let's say that we're uncertain about the joint system (X,Y), and our uncertainty is described by an equiprobable distribution over S.  That is, we're pretty sure X is in state 1, but Y is equally likely to be in any of states 1-4.  If we shut our eyes for a minute and then open them again, we will expect to see Y in state 1, but X might be in any of states 2-8.  Actually, X can only be in some of states 2-8, but it would be too costly to think out exactly which states these might be, so we'll just say 2-8.

If you consider the Shannon entropy of our uncertainty about X and Y as individual systems, X began with 0 bits of entropy because it had a single definite state, and Y began with 2 bits of entropy because it was equally likely to be in any of 4 possible states.  (There's no mutual information between X and Y.)  A bit of physics occurred, and lo, the entropy of Y went to 0, but the entropy of X went to log2(7) = 2.8 bits.  So entropy was transferred from one system to another, and decreased within the Y subsystem; but due to the cost of bookkeeping, we didn't bother to track some information, and hence (from our perspective) the overall entropy increased.

If there was a physical process that mapped past states onto future states like this:

X2,Y1 -> X2,Y1
X2,Y2 -> X2,Y1
X2,Y3 -> X2,Y1
X2,Y4 -> X2,Y1

Then you could have a physical process that would actually decrease entropy, because no matter where you started out, you would end up at the same place.  The laws of physics, developing over time, would compress the phase space.

But there is a theorem, Liouville's Theorem, which can be proven true of our laws of physics, which says that this never happens: phase space is conserved.

The Second Law of Thermodynamics is a corollary of Liouville's Theorem: no matter how clever your configuration of wheels and gears, you'll never be able to decrease entropy in one subsystem without increasing it somewhere else.  When the phase space of one subsystem narrows, the phase space of another subsystem must widen, and the joint space keeps the same volume.

Except that what was initially a compact phase space, may develop squiggles and wiggles and convolutions; so that to draw a simple boundary around the whole mess, you must draw a much larger boundary than before - this is what gives the appearance of entropy increasing.  (And in quantum systems, where different universes go different ways, entropy actually does increase in any local universe.  But omit this complication for now.)

The Second Law of Thermodynamics is actually probabilistic in nature - if you ask about the probability of hot water spontaneously entering the "cold water and electricity" state, the probability does exist, it's just very small.  This doesn't mean Liouville's Theorem is violated with small probability; a theorem's a theorem, after all.  It means that if you're in a great big phase space volume at the start, but you don't know where, you may assess a tiny little probability of ending up in some particular phase space volume.  So far as you know, with infinitesimal probability, this particular glass of hot water may be the kind that spontaneously transforms itself to electrical current and ice cubes.  (Neglecting, as usual, quantum effects.)

So the Second Law really is inherently Bayesian.  When it comes to any real thermodynamic system, it's a strictly lawful statement of your beliefs about the system, but only a probabilistic statement about the system itself.

"Hold on," you say.  "That's not what I learned in physics class," you say.  "In the lectures I heard, thermodynamics is about, you know, temperatures.  Uncertainty is a subjective state of mind!  The temperature of a glass of water is an objective property of the water!  What does heat have to do with probability?"

Oh ye of little trust.

In one direction, the connection between heat and probability is relatively straightforward:  If the only fact you know about a glass of water is its temperature, then you are much more uncertain about a hot glass of water than a cold glass of water.

Heat is the zipping around of lots of tiny molecules; the hotter they are, the faster they can go.  Not all the molecules in hot water are travelling at the same speed - the "temperature" isn't a uniform speed of all the molecules, it's an average speed of the molecules, which in turn corresponds to a predictable statistical distribution of speeds - anyway, the point is that, the hotter the water, the faster the water molecules could be going, and hence, the more uncertain you are about the velocity (not just speed) of any individual molecule.  When you multiply together your uncertainties about all the individual molecules, you will be exponentially more uncertain about the whole glass of water.

We take the logarithm of this exponential volume of uncertainty, and call that the entropy.  So it all works out, you see.

The connection in the other direction is less obvious.  Suppose there was a glass of water, about which, initially, you knew only that its temperature was 72 degrees.  Then, suddenly, Saint Laplace reveals to you the exact locations and velocities of all the atoms in the water.  You now know perfectly the state of the water, so, by the information-theoretic definition of entropy, its entropy is zero.  Does that make its thermodynamic entropy zero?  Is the water colder, because we know more about it?

Ignoring quantumness for the moment, the answer is:  Yes!  Yes it is!

Maxwell once asked:  Why can't we take a uniformly hot gas, and partition it into two volumes A and B, and let only fast-moving molecules pass from B to A, while only slow-moving molecules are allowed to pass from A to B?  If you could build a gate like this, soon you would have hot gas on the A side, and cold gas on the B side.  That would be a cheap way to refrigerate food, right?

The agent who inspects each gas molecule, and decides whether to let it through, is known as "Maxwell's Demon".  And the reason you can't build an efficient refrigerator this way, is that Maxwell's Demon generates entropy in the process of inspecting the gas molecules and deciding which ones to let through.

But suppose you already knew where all the gas molecules were?

Then you actually could run Maxwell's Demon and extract useful work.

So (again ignoring quantum effects for the moment), if you know the states of all the molecules in a glass of hot water, it is cold in a genuinely thermodynamic sense: you can take electricity out of it and leave behind an ice cube.

This doesn't violate Liouville's Theorem, because if Y is the water, and you are Maxwell's Demon (denoted M), the physical process behaves as:

M1,Y1 -> M1,Y1
M2,Y2 -> M2,Y1
M3,Y3 -> M3,Y1
M4,Y4 -> M4,Y1

Because Maxwell's demon knows the exact state of Y, this is mutual information between M and Y.  The mutual information decreases the joint entropy of (M,Y):  H(M,Y) = H(M) + H(Y) - I(M;Y).  M has 2 bits of entropy, Y has two bits of entropy, and their mutual information is 2 bits, so (M,Y) has a total of 2 + 2 - 2 = 2 bits of entropy.  The physical process just transforms the "coldness" (negentropy) of the mutual information to make the actual water cold - afterward, M has 2 bits of entropy, Y has 0 bits of entropy, and the mutual information is 0.  Nothing wrong with that!

And don't tell me that knowledge is "subjective".  Knowledge has to be represented in a brain, and that makes it as physical as anything else.  For M to physically represent an accurate picture of the state of Y, M's physical state must correlate with the state of Y.  You can take thermodynamic advantage of that - it's called a Szilard engine.

Or as E.T. Jaynes put it, "The old adage 'knowledge is power' is a very cogent truth, both in human relations and in thermodynamics."

And conversely, one subsystem cannot increase in mutual information with another subsystem, without (a) interacting with it and (b) doing thermodynamic work.

Otherwise you could build a Maxwell's Demon and violate the Second Law of Thermodynamics - which in turn would violate Liouville's Theorem - which is prohibited in the standard model of physics.

Which is to say:  To form accurate beliefs about something, you really do have to observe it.  It's a very physical, very real process: any rational mind does "work" in the thermodynamic sense, not just the sense of mental effort.

(It is sometimes said that it is erasing bits in order to prepare for the next observation that takes the thermodynamic work - but that distinction is just a matter of words and perspective; the math is unambiguous.)

(Discovering logical "truths" is a complication which I will not, for now, consider - at least in part because I am still thinking through the exact formalism myself.  In thermodynamics, knowledge of logical truths does not count as negentropy; as would be expected, since a reversible computer can compute logical truths at arbitrarily low cost.  All this that I have said is true of the logically omniscient: any lesser mind will necessarily be less efficient.)

"Forming accurate beliefs requires a corresponding amount of evidence" is a very cogent truth both in human relations and in thermodynamics: if blind faith actually worked as a method of investigation, you could turn warm water into electricity and ice cubes.  Just build a Maxwell's Demon that has blind faith in molecule velocities.

Engines of cognition are not so different from heat engines, though they manipulate entropy in a more subtle form than burning gasoline.  For example, to the extent that an engine of cognition is not perfectly efficient, it must radiate waste heat, just like a car engine or refrigerator.

"Cold rationality" is true in a sense that Hollywood scriptwriters never dreamed (and false in the sense that they did dream).

So unless you can tell me which specific step in your argument violates the laws of physics by giving you true knowledge of the unseen, don't expect me to believe that a big, elaborate clever argument can do it either.

65 comments, sorted by
magical algorithm
Highlighting new comments since Today at 5:45 PM
Select new highlight date
Moderation Guidelines: Reign of Terror - I delete anything I judge to be annoying or counterproductiveexpand_more

Isn't entropy a property of the system, not the observer?

Nope. It's a property of the observer, but one that behaves in such a lawful and inescapable way that it seems to you like a property of the system.

Your ignorance of next week's winning lottery numbers is a property of you, not just a property of the lottery balls, but good luck on ignoring your ignorance.

Someone elsewhere said: Almost all the time, I stick with this idea: Temperature of a gas is the mean kinetic energy of its molecules.

Aren't there vibrational degrees of freedom that also contribute to kinetic energy, and isn't that why different materials have different specific heats? I.e., what matters is kinetic energy per degree of freedom, not kinetic energy per molecule? So you actually do have to think about a molecule (not just measure its kinetic energy per se) to determine what its temperature is (which direction heat will flow in, compared to another material), even if you know the total amount of heat - putting the same amount of heat into a kilo of water or a kilo of iron will yield different "temperatures".

But the more important point: Suppose you've got an iron flywheel that's spinning very rapidly. That's definitely kinetic energy, so the average kinetic energy per molecule is high. Is it heat? That particular kinetic energy, of a spinning flywheel, doesn't look to you like heat, because you know how to extract most of it as useful work, and leave behind something colder (that is, with less mean kinetic energy per degree of freedom).

If you know the positions and speeds of all the elements in a system, their motion stops looking like heat, and starts looking like a spinning flywheel - usable kinetic energy that can be extracted right out.

Added definition of phase space.

I guess that if you stick your finger into the water it will still get burned, am I wrong?

Only if you're silly enough to stick in your finger at the wrong moment. Stick in your finger at exactly the right moment, and your finger will get colder while the water gets hotter - because you've timed it so that all the molecules next to your finger happen to be moving very slowly.

Of course you will usually have to wait so long for the right moment that all the protons evaporate before you see a chance to stick your finger in. And of course the trick relies on your knowing the exact behavior of the water, so that it has no entropy regardless of its temperature.

I think we may have to Taboo the word "cold" in this posting. As I understand it, an object is colder than another if it has a lower temperature than another (in other words, the average kinetic energy of its molecules is lower than those of another object). Therefore, knowing the exact position and velocity of all the (classical) molecules in an ideal gas doesn't make the gas "colder" until you actually DO use Maxwell's Demon on it. Saying that an object that you know enough about to use Maxwell's Demon on is "colder" than another conflicts with my understanding of the word. It's not actually colder, it's only "potentially colder" (by analogy to potential energy), if that makes any sense.

Yes, we're arguing about words, but that's because we're getting confused. :(

"Isn't speed the same as velocity?"

Nope, speed is a scalar, while velocity is a vector.

Unless I'm missing something, Shalizi usually makes more sense than this.

1) Measurements use work (or at least erasure in preparation for the next measurement uses work). They do not simply magically reduce our uncertainty without thermodynamic cost. Even if you measure and never erase, the measuring system must be in a prepared state, cannot be used again, and still produces entropy if you are not operating at absolute zero / infinite precision, which you can't do (third law of thermodynamics).

2) Because we are not logically omniscient, we lose information we already have as the result of not being willing to expend the computational cost of following every atom. Liouville's Theorem preserves a volume of probability but it can get awfully squiggly, so if you preserve a simple boundary around your uncertainty, it gets larger.

3) Quantum universes branch both ways and create new uncertainty in their branched agents.


In particular, when you say that knowledge of particles makes something colder, makes it possible to extract work, you've gone back to the ideal observer.

I think emphatically not! To extract work, you've got to be inside the system, extracting it.

If you take the perspective of a logically omniscient perfect observer outside the system, the notion of "entropy" is pretty much meaningless, as is "probability" - you never have to use statistical thermodynamics to model anything, you just use the deterministic precise wave equation.

While the notion of "entropy" seems to make a lot more sense when considered as observer-dependent, what continues to confuse me about this is what happens when you have time-reversed observers. If phase space volume is simply conserved, then the same principles apply to time-reversed observers, i.e., they also see entropy increasing. But this would imply that any time-reversed observer would have to draw boundaries very differently from us, and it's not at all clear how simply negating the 't' coordinate causes you to draw your boundaries in such a way that you know more about two gases when mixed rather than when unmixed. I feel I must be making some fundamental mistake here but I can't tell what.

That question does strike at the very basic assumptions that try to make sense of these phenomena. I read some more about this issue in Drescher's Good and Real, chapter 3.

The key issue with your question is, what exactly is a time-reversed observer? If you mean to flip the universal "time counter" and ask what the observers observe, well, there are some problems off the bat (no universal space of simultaneity, time not being an independent variable but a kind of measure of the other variables). But let's assume those away.

With time reversed, if you look at each time-slice, the observer perceives the exact same history that they would if time had been going the other way. This is because their makeup contains the same evidence telling them that they had the same past experience. In other words, their memories are the same. So they wouldn't have to draw boundaries any differently from us.

More generally, you shouldn't look at time as going positive or negative along some timeline; you should think of it as going futureward (toward higher observer-perceived entropy) or pastward (toward lower). As an analogy, think of the shift from modeling the earth as flat, to modeling it as a sphere with gravity: you realize that your "up" and "down" are not two unique, constant vectors, but rather, refer to whether you are going towards or away from the center of the earth. Just the same, futureward and pastward are determined, not by an increasing or decreasing time variable, but whether entropy increases or decreases from that point, so there can be negative time directions that go futureward.

To simplify a bit, it's not that "Hey, time increases, entropy increases -- gosh, they always seem to happen together!" Rather, it's that it's not possible for us to perceive (have a set of memories consistent with) entropy decreasing.

(I had been writing up a summary of Drescher's chapter 3 as an article but never got to finishing it. This comment draws from both Drescher, and from Barbour's "timeless universe" ideas.)

I don't mean reversing time on the whole universe; that is not really meaningful, for the reasons you specify. What I mean is, since the laws of physics are (nearly) time-symmetric, it seems that it should be possible to have, in our own universe alongside us, some sort of creature that is has a brain that really does remember what we would consider the future, and attempts to anticipate what we would consider the past. How would such a thing arise? Well, presumably by a time-reversed evolution, with mutation and natural selection occuring on a "replicator" that propagates itself backwards in time; that is, after all, what it would have to optimize for (well, given the right environment).

Yes, if you take us out of the picture, you can just negate the t-coordinate and say I'm not proposing anything weird. But the (near) time-reversibility of the laws of physics means that it should possible for us to both occur in the same universe.

Admittedly, if we saw such a thing, we would probably never recognize its "genes" as replicators of any sort - if we saw the pattern at all, they would appear as some sort of anti-replicators. And could we even recognize such creatures as containing decision engines at all?

...OK, having written that out I now can't help but suspect the problem is in posing the existence of these things in the first place. After all, the anti-replicators that formed their genes, would have to have some causal origin from our point of view, and that seems highly improbable. Anti-replicators become less common, not more common, meaning, but there shouldn't be any at the start of the universe - or in other words, they should all be extinct by then. What does a time-reversed extinction event look like? Probably not just one thing. But of course, if, for example, we were to hypothetically nuke the whole planet and destroy all life, they'd see the sudden appearance of a whole bunch of anti-replicators, which then slowly annihilate each other over 4 billion years, and have a good causal explanation for the whole thing! If they recognized the pattern at all, that is.

This forces me to wonder if, yes, it really is correct that a time-reversed observer necessarily would have to have such a different point of view that it would be natural to draw the boundaries in a way such that they still saw entropy as increasing.

I don't actually understand this very well, so I don't think I'm close to an actual answer, but I think here's my best attempt: While such things are theoretically possible, if they existed, we could never recognize them (or vice versa), as that would violate the second law of thermodynamics from our (their) point of view? I think? Though that still is not answering the original question.

The fact that you know all the trajectories/positions couldn't matter less to the glass of water, the only thing that matters is (using your jargon) the phase space volume it occupies.

What, precisely, do you think it means for a statistically viewed system to "occupy a volume of phase space"? When you talk about the "number of microstates", what exactly do you think you are counting?

Ridiculously good post, ridiculously good comments, in my opinion.

"it prohibits perpetual motion machines of the first type, which run and run indefinitely without consuming fuel or any other resource"

That's only right if you're able to extract work from it and it still runs undiminished.

Otherwise it's only a perpetual motion machine of the second type.

So this is how warming and cooling spells work in the Rational Potterverse?

Douglas Knight: "In particular, when you say that knowledge of particles makes something colder, makes it possible to extract work, you've gone back to the ideal observer."

Eliezer: "I think emphatically not! To extract work, you've got to be inside the system, extracting it."

I think what Douglas may be implying is that unless you are perfectly insulated from what you are getting knowledge of (e.g. and ideal observer), the act of getting knowledge of something will heat it up. As you are doing work and increasing the entropy in the surroundings.

It raises some interesting and quirky ideas in me. I'm picturing future the future where an intelligence explosion or at least expansion where a statistical machine is the dominant producer of entropy on the planet and it deciding to not go perform too much statistics so it doesn't go over the hypsithermal limit of its environment and make its environment less rather than more predictable.

Shalizi usually makes more sense than this

a sign to give it more consideration.

Your response seems to be that Shalizi assumes an ideal observer, while you assume an observer-in-the-system. That's fine, as far as it goes, but often you assume an ideal observer, and statistical mechanics is able to function with some kind of ideal observer. If you can build a model with an ideal observer, you should!

In particular, when you say that knowledge of particles makes something colder, makes it possible to extract work, you've gone back to the ideal observer.

More tangentially: I guess the point of statistical mechanics is that there may (ergodicity) be only a few possible robust measurements, like temperature, and a real observer can draw the same conclusions from such measurements as an ideal observer. I'm annoyed that no one ever spelled that out to me and Shalizi sounds like he's annoyed by Bayesians who don't spell out their models. At the very least, a straw man gives you a chance to say "here's how my model differs."

If you don't have an observer in the system, you instead have an observer outside the system, and in order to actually be observing must be interacting with the system -- in which case the system is no longer closed, and therefore, simplistic statistical mechanics is no longer sufficient, and you have to bring in all the open-system math.

I enjoyed this article. As several commenters have suggested, it seems not just counterintuitive but actually non-physical to say that the warm water has become colder just because I know more about it. The subjective nature of entropy that this seems to imply is absurd. As has been pointed out, a stationary thermometer in the water will show the same reading after my visit from Saint LaPlace as it did before.

I think the problem is resolved if we consider our system boundary to include not just the water, but the observer as well. The entropy of the water has not changed because I have more information, but the total entropy of the water-observer system can be considered to have decreased because of the mutual information that has been magically added (by Saint LaPlace).

To continue the analogy, it is as though my brain now contains negentropy that has been arranged to exactly cancel the entropy of the water, making our combined net entropy smaller. The entropy of the water itself has not changed and the effect is not subjective. It arises only if we consider me (the observer) with my magical cargo of negentropy as part of the system. Should I choose to put my knowledge to use as an avatar of Maxwell's Demon, then I can actually lower the entropy of the water (by taking it into myself). If, however, I walk away and do nothing to the water based on my knowledge then the entropy of the water itself remains just as it was. (I, however, have been lastingly changed by my encounter with LaPlace.)

You lost me there.

1) If Alice and Bob observe the system in your first example, and Alice decides to keep track precisely of X's possile states while Bob just says "2-8", the entropy of X+Y is 2 bits for Alice and 2.8 for Bob. Isn't entropy a property of the system, not the observer? (This is the problem with "subjectivity": of course knowledge is physical, it's just that it depends on the observer and the observed system instead of just the system.)

2) If Alice knows all the molecules' positions and velocities, a thermometer will still display the same number; if she calculates the average speed of the molecules, she will find this same number; if she sticks her finger in the water at a random moment, she should expect to feel the same thing Bob, who just knows the water's temparature, does. How is the water colder? Admittedly, Alice could make it colder (and extract electricity), but she doesn't have to.

Will Pearson, you can indeed summarily reject the possibility, and that's why I kept saying, "ignoring quantum". For quantumness, you would need a total description of the quantum state of the water, and this you can never obtain by any physical means. Though this is true even under classical mechanics: The third law of thermodynamics, on the impossibility of obtaining absolute zero, implies the impossibility of obtaining infinite-precision knowledge.

Silas, 'twas explicit: I said "sealed box".

Another solid essay.

To form accurate beliefs about something, you really do have to observe it.

How do we model the fact that I know the Universe was in a specific low-entropy state (spacetime was flat) shortly after the Big Bang? It's a small region in the phase space, but I don't have enough bits of observations to directly pick that region out of all the points in phase space.

Could the second law of thermodynamics also be understood as "the function between successive states as described by the laws of physics is bijective"?

Liouville's theorem alone does not suffice to obtain the Second Law. You might want to look up the objections to Boltzmann's derivation of H-theorem made by Zermelo (wait long enough and the system will return to a state arbitrarily close to the original state, due to Poincare's recurrence theorem) and Loschmidt (reverse the speeds of all particles and the entropy will decrease to its original value). Boltzmann killed himself in a bout of depression because he could not find a satisfactory answer to these objections. More than a century later, we still don't have satisfactory answers.

Article linked from Reddit, which I haven't read: Demonic device converts information to energy (Scientific American).

So after doing the Maxwell's Demon thing, you say that mutual information decreases, the entropy of Y decreases, so we are left with the same amount of total entropy:

M1,Y1 -> M1,Y1

M2,Y2 -> M2,Y1

M3,Y3 -> M3,Y1

M4,Y4 -> M4,Y1

However, I don't see why the mutual information would be lost; would the Demon know where he "put" the molecule, thus making the transition look more like:

M1,Y1 -> M1,Y1

M2,Y2 -> M1,Y1

M3,Y3 -> M1,Y1

M4,Y4 -> M1,Y1

This would of course shrink the phase space, violate the second law, etc. I just do not see how M would stay the same when Y changed (i.e. lose the mutual information).

That was a simplified account of what is going on. To include the full system, you would have to include the means by which the Demon recorded the knowledge. However it's recorded, it overwrites the information that was otherwise contained in that recording mechanism (i.e., mutual information with some environment), and this deletion of mutual information is an increase in entropy.

But in such an accounting, you would have three systems, which complicates the scenario. In the example given, the Demon is implicitly taken to include the Demon's recording devices (even if that's his brain). The fact that it has destroyed some relationship between some system (the recording device) and another is represented as higher Demon entropy that retains independence from the Y system. (There are extra states the Demon can have that have nothing to do with Y.)

Did that make any sense?

I guess it would seem to me that what gets "overwritten" is the (now invalid) knowledge of where Y is, and what it is overwritten with is the new, valid position of it. I'll have to chew on it for a while.

By the way, sort of unrelated, but I've always wondered why gravity acting on things is not considered a loss of entropy. For example I can drop a bowling ball from multiple distances, but it will always end up 0 feet from the ground:

B4 -> B0

B3 -> B0

B2 -> B0


The only thing I can think of is that, when the ball hits the ground the collision creates enough heat (i.e. entropy) to balance everything out. Is that correct?

Yes, that's basically correct: the ball ends up at the same place, but differs in another state -- velocity -- which gives a different result for how much momentum it imparts to the earth, or heat energy it generates through friction, or elastic energy in compressing its foundation.

Btw, note that there is a connection between the energy of a system and the information it stores. Higher energy states are less likely and therefore store more information. (See Academician's recent post on informativeness in information theory.) Because energy of a state is relative to another, this suggests a research program that breaks down the laws of physics into rules about changes in informational content. I'm still in the process of finding out how much work has been done on this and what's left to do.

If you have the time I would be interested in seeing a mathematical description of a system that increases its mutual information with the environment, with the total entropy of the system+environment increasing.

Joseph Knecht:

The problem with your argument is that justification is cheap, while accuracy is expensive. The canonical examples of "unjustified" beliefs involve mis-calibration, but calibration is easy to correct just by making one's beliefs vaguer and less precise. Taken to the extreme, a maximum-entropy probability distribution is perfectly calibrated, but it adds zero bits of mutual information with the environment.

two systems in thermal contact trade energy to maximize the net entropy of the ensemble.

Actually the assumption is that two systems in thermal contact come to some equilibrium state.

Let this equilibrium state maximize something, call it S, and use calculus.

Energy is conserved.

Therefore the energy change in on system equals minus the energy change in the other, and the change in S wrt the energy change in each system has to be equal in both systems at the maximum of total S.

Call that change wrt energy the (inverse) temperature. Two systems in thermal contact come to the same temperature, is then what the assumption of some equilibrium of something comes to, after you rename the derivatives.

Only the assumption of an equilibrium has been introduced to get this.

That's where

I don't mind people ignoring elements of science when they are not important. e.g. ignoring general relativity when calculating ball trajectories.

But molecules and atoms are very much in the quantum realm. So it seemed to me to be like saying, ignoring special relativity, when things are travelling faster than the speed of light then this analogy holds, from this we can conclude blah. To me it seems unlikely to hold any insights.

I don't see why I should accept any conclusion drawn from the premises if I do not hold with the premises. But this brings up an interesting point, when is it valid to ignore data? Is it ever?

To me your first point seems elementary as I don't have good evidence for pyschic powers, and can be derived from the uncertainty principle or probably preferably from the no cloning theorem. Your second would be better derived from quantum physics as well by showing a minimum energy required for a bit flip, if it can.

I suggest a lot of caution in thinking about how entropy appears in thermodynamics and information theory. All of statistical mechanics is based on the concept of energy, which has no analogue in information theory. Some people would suggest that for this reason the two quantities should not be called by the same term.

the "temperature" isn't a uniform speed of all the molecules, it's an average speed of the molecules, which in turn corresponds to a predictable statistical distribution of speeds

I assume you know this, but some readers may not: temperature is not actually equivalent to energy/speed, but rather to the derivative of entropy with respect to energy:

1/T = dS/dE

This is why we observe temperature equilibriation: two systems in thermal contact trade energy to maximize the net entropy of the ensemble. Thus in equilibrium a small shift in energy from one system to the other must not change the ensemble energy ==> the temperature of the systems must be equal.

In almost all real systems, temperature and energy are monotonically related, so you won't go too far astray by thinking of temperature as energy. However, in theory one can imagine systems that are forced into a smaller number of states as their energies increase (dS/dE < 0) and so in fact have negative temperature:

I guess that if you stick your finger into the water it will still get burned, am I wrong?

Only if you're silly enough to stick in your finger at the wrong moment.

Ok, I thik we have a problem of definition here. You said the water got colder in a thermodynamic definition. But you agree that if I take a thermometer and insert it into the water and leave it there for a while it will still indicate 'hot water'. Right?

What I don't understand is your thermodynamic definition of colder. And I'm no physicist. Btw, I understand that with the information about the velocities you have power to make the water colder, but that doesn't mean that it actually will get colder(at least not right now).

Eliezer, I was going to point out that you never defined "phase space", but Roland beat me to it. It's a small hole in an otherwise excellent post.


Eliezer is using the physics jargon meanings of speed and velocity: speed is a magnitude, a raw number; velocity is magnitude and direction together. A car might be travelling at a speed of 65 mph; if you include a direction, e.g., 65 mph east, then you've got its velocity.

the more uncertain you are about the velocity (not just speed)

Isn't speed the same as velocity?

What is phase space? Is it the same as state space? You didn't define it.

Does that make its thermodynamic entropy zero? Is the water colder, because we know more

about it?

Ignoring quantumness for the moment, the answer is: Yes! Yes it is!

I guess that if you stick your finger into the water it will still get burned, am I wrong?

And conversely, one subsystem cannot increase in mutual information with another subsystem,

without (a) interacting with it and (b) doing thermodynamic work.

It is not entirely clear to me how you arrived at this conclusion.

Your finger will not get burned; it will suffer the cumulative damage resulting from an unusually high quantity of unrelated high-speed molecule attacks.

You could imagine a particle gun that shoots water molecules with the exact same speed distribution as hot water (carefully aligned so they don't collide mid-beam), but all with the same direction - straight towards you.

The result of sticking your hand in such a beam would be roughly the same as putting it in hot water, ignoring the asymmetric momentum transfer. However, it is easy to see that you can extract useful energy from the beam.

Another pertinent example might be this: a metal shaft could spin so fast that its atoms' velocity distribution could be the same as that of the (hotter!) gaseous form of the same metal. Yet the spinning of the shaft does not evaporate the metal.

Why? Because, to a typical observer of the shaft, its degrees of freedom are significantly more constrained. So, since the observer knows more about the shaft (that its atoms are in a solid lattice that moves in relative unison), that makes the shaft colder -- and it allows you to extract more mechanical work from the shaft than if it were a hot gas with the same average particle velocities!

That is not in general possible. The speed at radius r is v = w r. Taking an arbitrary axisymmetric mass-distribution rho(r), we have a distribution of mass at speed v = w r that is U(r) = 2 pi h r^2 rho(r) and U(v) = 2 pi h v^2/w^2 rho(v/w) A monatomic gas at temperature T has a kinetic energy distribution of 2 Sqrt(E / Pi (kT)^3) exp(-E/kT) (dE), and a speed distribution of sqrt(2(m/kT)^3 / pi) v^2 exp(-m v^2 / 2 kT) (dv). By carefully tailoring rho(r) to an exponential, you can match this distribution (up to some finite cut-off, of course), at one specific match of angular speed w and temperature T.

This, of course, only matches speeds, not velocities, which will be a three-dimensional distribution. This spinning shaft of course, v_z = 0.

For a gas in a mass-sealed container, average vx, vy, vz are also zero, just as in the shaft, so this trivially matches.

You're correct that I should have said speeds, and I should have mentioned that the shaft requires a special shape/density distribution, but the point stands: The molecular property distribution doesn't by itself tell you how cold something is, or how much energy can be extracted -- it's also relevant how its degrees of freedom are constrained, which shows how your knowledge of (mutual information with) the system matters.

For a gas in a mass-sealed container, average vx, vy, vz are also zero, just as in the shaft, so this trivially matches.

Yes, but the average tells you little that's meaningful: it's equivalent to the overall velocity of the gas as whole.

I agree with your overall point. Temperature is not directly a property of the system, but of how we can characterize it, including what constraints there are on it. I just think that this wasn't a great example for that point of view, precisely because you claimed agreement of distributions that doesn't exist.

A better way to say explain this can use the shaft. A shaft has the very strong constraint that position and velocity are perfectly correlated. This constraint lets us extract virtually all the kinetic energy out. A gas that had the same distribution of velocities, but lacked this constraint, would be very hard to extract useful energy from. No simple arrangement could do much better than treating it as a thermalized gas with the same average kinetic energy (and it would quickly evolve so that the velocity distribution would match this).

I just think that this wasn't a great example for that point of view, precisely because you claimed agreement of distributions that doesn't exist.

Sure it does -- it just requires a special shaft shaping/material. Hence the "a metal shaft could ..."

Otherwise, we're in agreement.

I would call that 'burned'. If I call 'standing outside getting hit by lots of UV light' sunburned then it seems fair to call getting hit by lots of high speed water molecules burned too.

So in the following transformation:

X1Y1 -> X2Y1 X1Y2 -> X4Y1 X1Y3 -> X6Y1 X1Y4 -> X8Y1

You say that while true entropy has not increased (it stays at 2 bits), apparent entropy has, due to the observer not keeping track of X and just lumping its possible states into X2-X8. If this is the case, why doesn't observed entropy decrease as well, since phase space is preserved with the following?

X2Y1 -> X1Y1 X4Y1 -> X1Y2 X6Y1 -> X1Y3 X8Y1 -> X1Y4

Why doesn't observed entropy decrease as well, since phase space is preserved with the following?

X2Y1 -> X1Y1
X4Y1 -> X1Y2
X6Y1 -> X1Y3
X8Y1 -> X1Y4

(I guess DaveInNYC won't read this but I guess someone else might.)

If you lump together X's starting state into X2-X8 then you can't be sure that it isn't actually X3, X5 or X7. So you have to look at where those possibilities go as well. Then the entropy can't go down (since by Liouville's Theorem they have to go somewhere different from X2, X4, X6 and X8).

Indeed is one hell of a post, i am from computer science background, had to read the post 5 to six times and most of the comments at least twice, its worth it.

If someone is still following the post, i would like to know, can randomness of the particles be measured? or is it calculated according to probability? i remember vaguely from my college reading that entropy is random energy, so, for a perfect transfer X-> Y, how is the final state determined (because of the randomness).Arent accurate beliefs functions of randomness?

Part of the point of this post is that particles aren't ever random -- random is not a property of the particles, but of our description of the particles.


I don't see how your response addresses my concern that saying accurate belief requires observation implies unacceptable consequences for the man on the street, such as that his correct belief that the Giants would win on Sunday is nevertheless not an accurate belief.

But the more important point: Suppose you've got an iron flywheel that's spinning very rapidly. That's definitely kinetic energy, so the average kinetic energy per molecule is high. Is it heat? That particular kinetic energy, of a spinning flywheel, doesn't look to you like heat, because you know how to extract most of it as useful work, and leave behind something colder (that is, with less mean kinetic energy per degree of freedom).

Systems in thermal contact (by radiation of nothing else) come to the same temperature. That makes it pretty objective if one of the systems is a thermometer, whether it's heat or not.

Eliezer Yudkowsky (I'll remove the underscores from now on): I enjoyed reading this, but I don't quite understand all the references to the impossibility of turning warm water into electricity and ice cubes: You don't need extra information or violation of the laws of physics to do this. You could run a heat engine off the temperature difference between the warm water and the environment, and have that work drive a refrigerator. It's just that you couldn't exploit this phenomenon to make a perpetual motion machine.

I probably missed an implicit (or explicit!) qualification of all that, and if so, reader new to thermodynamics can just take this as a clarification.

Good post, lays it down very nicely. Quick question:

Why is it that you can’t turn warm water into ice cubes and electricity, but reversible computing can use an arbitrarily small amount of energy? My guess is that computing (logic?) must be fundamentally different from work in this sense. Logic is, in a sense, ‘already there,’ whereas work requires energy.

"But suppose you already knew where all the gas molecules were?"

I assume by this you mean I have an exact knowledge of position and momentum. Why should I suppose a scenario that is contrary to what I know of the uncertainty principle? Can't I reject it for the same reasons you reject the possibility of waking up with a purple tentacle.

Without discounting the predictive power of the second law, my confidence in our understanding of its physical basis has been seriously reduced. This after viewing a recent series of talks held at MIT, Meeting the Entropy Challenge.

Of particular interest were the discussions about the lack of an established definition of entropy.

To form accurate beliefs about something, you really do have to observe it.

Does this not confuse accurate belief with knowledge? Leaving aside doubts about whether justified accurate belief is sufficient for knowledge (e.g., the Gettier problem), there is certainly more to knowledge than just accurate belief, and while I accept your statement for knowledge, it does not seem true for mere accurate belief.

I suppose the issue hinges on -- and perhaps this is your point -- whether accurate means probability of being correct or whether it turns out to have been correct. On the second account -- which is the common meaning of accurate -- the lotto player who believes she will win and actually does win has an accurate belief before she wins, though she is of course not justified in having that belief. In terms of the first sense of accurate, she is not accurate at all, but I think you'll have a much harder time trying to convince people of that than you would if you used knowledge instead of accurate belief. The man on the street will not accept that if he believes the Giants will win on Sunday and they actually win that his belief was nevertheless not accurate, while he'll easily acknowledge that he didn't really know they would win.

"Is the water colder, because we know more about it? ...Yes! Yes it is!"

You're kidding, right? Knowing something about a system doesn't change the system (neglecting quantum, of course). The statistical way to define entropy (as you mentioned) is the log of the number of microstates. The fact that you know all the trajectories/positions couldn't matter less to the glass of water, the only thing that matters is (using your jargon) the phase space volume it occupies.

Reshape the space for a second. Call it 6-D, with each particle a point, instead of 6N-D. Now the entropy would correspond to the volume actually occupied in 6-D space, rather than the possible volume among which your single point can choose. W

With the single point, you get sucked into the fallacy that because you know where the point is at one time, that's the only possible location it can have, and you're tricked into believing the entropy is much smaller than it is.

Statistical physics assumes exact particle trajectories are random and unknowable, although this was never believed to be fundamental. It was just a convenient way to ignore things nobody cared about, and take averages. Restricting yourself to that one point in phase space, you violate that assumption.

Maxwell's demon is ruled out by information theory. That's not quite the same thing as saying that it's Bayesian.