Explaining information theoretic vs thermodynamic entropy?

by wedrifid1 min read4th Nov 201013 comments

-3

Personal Blog

What is the best way to go about explaining the difference between these two different types of entropy? I can see the difference myself and give all sorts of intuitive reasons for how the concepts work and how they kind of relate. At the same time I can see why my (undergraduate) physicist friends would be skeptical when I tell them that no, I haven't got it backwards and a string of all '1's has nearly zero entropy while a perfectly random string is 'maximum entropy'. After all, if your entire physical system degenerates into a mush with no order that you know nothing about then you say it is full of entropy.

 

How would I make them understand the concepts before nerdy undergraduate arrogance turns off their brains? Preferably giving them the kind of intuitive grasp that would last rather than just persuading them via authoritative speech, charm and appeal to authority. I prefer people to comprehend me than to be able to repeat my passwords. (Except where having people accept my authority and dominance will get me laid in which case I may have to make concessions to practicality.)

13 comments, sorted by Highlighting new comments since Today at 9:55 PM
New Comment

haven't got it backwards and a string of all '1's has nearly zero entropy while a perfectly random string is 'maximum entropy'.

Ugh. I realize you probably know what you are talking about, but I expect a category error like this is probably not going to help you explain it...

Edit: Actually, I suppose that sort of thing is not really a problem if they're used to the convention where "a random X" means "a probability distribution over Xs", but if you're having to introduce information entropy I expect that's probably not the case. The real problem is that the string of all 1s is a distractor - it will make people think the fact that it's all 1s is relevant, rather than just the fact that it's a fixed string.

Edit once again: Oh, did you mean Kolmogorov complexity? Then never mind. "Entropy" without qualification usually means Shannon entropy.

When I hear the phrase "information theory," I think of Shannon, not Kolmogorov. Shannon entropy is a bridge between the two concepts of your post, but rather closer to thermodynamic.

Sniffnoy's comment has it right. Information-theoretic (Shannon) entropy is very similar to thermodynamic entropy and they don't contradict each other as you seem to think. They don't talk about individual bit strings (aka microstates, aka messages), but rather probability distributions over them. See this wikipedia page for details. If you have in mind a different notion of entropy based on algorithms and Kolmogorov complexity, you'll have to justify its usefulness to your physicist friends yourself, and I'm afraid you won't find much success. I don't have much use for K-complexity myself, because you can't make actual calculations with it the way you can with Shannon entropy.

Sniffnoy's comment has it right. Information-theoretic (Shannon) entropy is very similar to thermodynamic entropy and they don't contradict each other as you seem to think.

No, I don't think that they contract each other, and that isn't suggested by my words. I merely refrain from holding my associates in contempt for this particular folly since it is less intuitive than other concepts of equivalent levels complexity.

Wait, what exactly is their folly? I'm confused.

Have you checked if your friends actually know statistical physics? Maybe they only know the thermodynamic concept of entropy, which could seem quite different from the information theoretic entropy.

This book explains at the beginning why entropy is not a measure for disorder, which seems to be a common misconception among physicists.

Have you checked if your friends actually know statistical physics?

They don't. If they knew they would not have any problem.

How about something like this:

You are allowed to choose and measure an atom or bit from from the system. The location you choose for each measurement must be unique within the set of measurements.

After you pick the location but before you make the measurement you must predict the result. How well can you predict these results?

I suppose I'm not clear on the 'difference' between them- as far as I can tell, they're basically the same. Maxwell's Demon seems to be the standard tool for discussing the two and how they're linked (you can come up with a system that's arbitrarily good at playing the role of the demon, but the demon's memory has to change from the initial state to the final state, and that results in the 2nd law holding).

The natural analog of a string of bits that are all 1 is a group of particles which all have exactly the same velocity- it shouldn't be hard to see why the entropy is low in both cases.

More important than good explanations, of course, is purging your impression of them as arrogant. Or are you that good at manipulating your subtext?

[edit] Example:

After all, if your entire physical system degenerates into a mush with no order that you know nothing about then you say it is full of entropy.

Maximum entropy means something very specific, actually, and so it's hardly true to say you know nothing about it. When someone's gone through the beautiful process that leads to the Boltzmann distribution, and especially if they really get what's going on, they are likely to take statements like that as evidence of ignorance.

More important than good explanations, of course, is purging your impression of them as arrogant.

Why? They are arrogant. They take take pride in their arrogance. It is a trait that is beneficial at times and detrimental at others. This isn't a value judgement.

Or are you that good at manipulating your subtext?

I calibrate more through acceptance then self deception - although both can work.

Maxwell's Demon seems to be the standard tool for discussing the two and how they're linked (you can come up with a system that's arbitrarily good at playing the role of the demon, but the demon's memory has to change from the initial state to the final state, and that results in the 2nd law holding).

Thanks, that sounds like an interesting way to divert the conversation, one way or the other. It's probably not something that I will encounter in conversation again but as a historical-edit-counterfactual that seems like a good solution.

Why? ... I calibrate more through acceptance then self deception - although both can work.

Whatever works for you. Most people I know have put no effort into calibration, and I suggest self-deception or self-modification to them as it works better than unconscious acceptance, though perhaps not better than conscious acceptance.

They may be interested in original works; Landauer was the one that put forward the information entropy explanation of Maxwell's Demon, but all I can find in a quick search is his 1961 paper on heat generated by irreversible logic operations, which is the underpinning of the argument that equal entropy is generated in the demon's memory.

Actually, I had always heard that it was Szilard, back in 1929, that came up with the original idea. So says wikipedia.

I first heard of Szilard's thought experiment back in high school from Pierce's classic popularization of Shannon's theory Symbols, Signals, and Noise. This book, which I strongly recommend, is now available free online. The best non-mathematical exposition of Shannon ever. (Well, there is some math, but it is pretty simple).

Szilard's idea is pretty cool. A heat engine with a working fluid consisting of a very thin gas. How thin? A single molecule.

So, from my reading Szilard's answer and Landauer's answer are slightly different. But the descriptions on the page you linked vs. the Maxwell's Demon page are slightly different, and so that may be the source of my impression. It seems that Szilard claimed that acquiring the information is where the entropy gets balanced, whereas Landauer claims that restoring the Demon to its original memory is where the entropy gets balanced. Regardless of whether or not both are correct / deserve to be called the 'information entropy explanation', Landauer's is the one that inspired my original explanation.