An analogy as the midwife of thermodynamics

[-]TLW4y40

It sounds like there’s a path through the Second Law of Thermodynamics and Noether’s theorem, but I haven’t followed it yet.

There is indeed a path. Note a few potential "loopholes":

There are (unphysical) Newtonian physics systems where it is possible to approach negative-infinite potential energy in finite time. So yes, strictly speaking energy is conserved, but that doesn't actually say that much.
1. (For instance: https://en.wikipedia.org/wiki/Painlev%C3%A9_conjecture#/media/File:Xia's_5-body_configuration.png )
  1. (Roughly speaking: the top 2 bodies and the center body undergo a 3-body encounter that drops the top 2 bodies into a smaller orbit, using the resulting potential energy to accelerate the top 2 bodies upward and accelerates the middle body towards the bottom 2 bodies faster than it arrived. Repeat, mirrored, with the bottom 2 bodies. Repeat, mirrored, with the top 2 bodies. Repeat, mirrored, with the lower 2 bodies. Etc. Each loop pulls gravitational potential energy from the 2 sets of 2 bodies and dumps it into kinetic energy, and the center body gets faster faster than the 2 sets of 2 bodies pull apart. Net result is an infinite number of 3-body encounters and infinite velocity in finite time...)
It relies on (continuous) time-translation symmetry.
1. This doesn't hold in general for general relativity.
2. This does hold for Newtonian mechanics.
3. Time translation symmetry is a hypothesis, although a fairly well-tested one.
4. (If you want to rabbit-hole here, look at time crystals.)

(That being said, it's been too long since I've looked seriously into Physics.)

[-]adamShimi4y30

Thanks for the comment!

Could you give more details on the path itself?

Also, do you consider your loopholes like technicalities, or more serious problems?

[-]TLW4y10

Could you give more details on the path itself?

...honestly, probably not well. It's been too long. At a high level: Noether's theorem implies that if you have a Lagrangian that's invariant under a perturbation of coordinates, that corresponds to a conserved quantity of the system. In particular: invariance under time perturbations (a.k.a. continuous time-translation symmetry) corresponds to a conserved quantity that turns out to be conservation of energy.

Also, do you consider your loopholes like technicalities, or more serious problems?

For 1: it's like someone showing how to break your 1024-bit hash in 2^500 operations. It isn't a problem in and of itself, but it's suggestive of deeper problems. (It requires both infinite precision and point particles to achieve, neither of which appear to be actually possible in our universe.)

For 2: I'd consider the issues with general relativity (and however quantum gravity shakes out) to be potentially an issue - though given that it's not an issue for classical mechanics any loopholes would likely be in regimes where the Newtonian approximation breaks down.

That all being said, take this with a grain of salt. I'm not confident I remembered everything correctly.

[-]Rohin Shah4y-40

Yet that doesn't solve our initial problem (how did Sadi discover his result?); it only refines the questions. Sure, he might have leveraged Lazare's analogy, but why did that work? Why is Lazare work so productive when applied to thermodynamics, when it's a dead end in its original field (mechanics)? And how come that the analogy leads to the right insight when it actually breaks?

Default hypothesis: Lots of people were trying lots of different ways of making progress based on lots of different bad analogies; one happened to work out despite being bad. "Even a broken clock is right twice a day."

[-]adamShimi4y110

Sorry for not answering earlier, there's a lot of things I wanted to say in response to this comment, and I took some time to organize my thoughts.

First, there's an object-level point on which we might agree: I don't particularly believe that Sadi choose his analogy, it just made sense to him. There might be an aspect of this intuition that got hidden bits of evidence unconsciously, which is the sort of thing that I would want to find out how to do consciously and explicitly as much as possible.

Now on the meta level, I believe that your default hypothesis is wrong because it is assuming an incredible amount of structure and is in contradiction with the history of science (and invention in general). You use the analogy of the clock, which has nothing to do with how the universe works; it doesn't go through every possible pattern one after the other, such that any guess will be right some time.

And more generally, impressive results in science and maths and a lot of places come from people even finding something in a high dimensional world. If you have an incredibly large amount of possibility, no amount of unbiased random sampling will yield anything, certainly not the bounty of results we get from science. Even more in a time like the Carnot's, where there wasn't that many scientists at all.

It's basically Einstein's Arrogance and Science in High-Dimensional World: to do anything at all, you need to reveal hidden bits of evidence somehow to reduce the search space.

In this example, I expect that most of the bits of evidence were revealed by Lazare, because even if he had a bad ontology for mechanics (which still works decently well at the macro level, mind you), still created a powerful framework for thinking about dissipative systems. He basically extracted bits of evidence from the whole class of dissipative systems, enough to say "the most efficient will look like that" and reducing the search space tremendously for Sadi later on.

If you want some evidence that this example was not just a random sampling that worked but actually a strongly biased move, there's the fact that Sadi's work got used (after being neglected) 25 years later for the formalization of modern thermodynamics, and despite its age, that's what the founders of modern thermodynamics used. Also most of his result, despite staying in obscurity for at least 10 years, haven't been rediscovered AFAIK (or I expect things like Carnot's theorem to have a name with the multiple inventors in it)

[-]Rohin Shah4y-10

Obviously hypotheses do not just come out of an "unbiased random sampling" process, there are some intuitions that drive them that incorporate tons of evidence that the scientist already has.

I thought you were saying something along the lines of: "some people seem particularly good at this, instead of producing hypotheses that have a 1/1000 chance of being correct, they instead produce hypotheses with a 1/2 chance of being correct. Let's look at these people in particular and figure out how to replicate their reasoning".

I'm saying in response to that (which may not be what you meant): "In the specific case of Carnot's theorem, my default hypothesis is that ~1000 people tried hypotheses with probability ~1/1000 and one happened to be correct; you can study any of those 1000 people / ideas instead of studying Carnot in particular. (Studying the wrong ones is probably better, the wrong parts could tell you what people can't do when creating hypotheses in advance.)"

I believe that your default hypothesis is wrong because it is assuming an incredible amount of structure and is in contradiction with the history of science (and invention in general).

I wasn't trying to give a grand theory of science and invention. I'm trying to explain the specific question I quoted, about why a seemingly "bad" analogy still worked out well in this case.

I also don't know what you think the hypothesis is in contradiction with.

If you have an incredibly large amount of possibility, no amount of unbiased random sampling will yield anything, certainly not the bounty of results we get from science.

I totally agree it was biased in the sense that "dissipative theory" is a lot simpler than "on Sundays, my experiments do whatever Abraham Lincoln would have predicted would happen; on other days it's whatever George Washington would have predicted", and so people investigated the theories like the former much more than theories like the latter.

If you want some evidence that this example was not just a random sampling that worked but actually a strongly biased move, there's the fact that Sadi's work got used (after being neglected) 25 years later for the formalization of modern thermodynamics, and despite its age, that's what the founders of modern thermodynamics used. Also most of his result, despite staying in obscurity for at least 10 years, haven't been rediscovered AFAIK (or I expect things like Carnot's theorem to have a name with the multiple inventors in it)

I expect to see this result in a random sampling world; why don't you? It seems like you just have to wait for the same random sample to be drawn again; not drawing that sample in 25 years seems totally normal.

[-]adamShimi4y20

Thanks for the detailed answer!

I thought you were saying something along the lines of: "some people seem particularly good at this, instead of producing hypotheses that have a 1/1000 chance of being correct, they instead produce hypotheses with a 1/2 chance of being correct. Let's look at these people in particular and figure out how to replicate their reasoning".
I'm saying in response to that (which may not be what you meant): "In the specific case of Carnot's theorem, my default hypothesis is that ~1000 people tried hypotheses with probability ~1/1000 and one happened to be correct; you can study any of those 1000 people / ideas instead of studying Carnot in particular. (Studying the wrong ones is probably better, the wrong parts could tell you what people can't do when creating hypotheses in advance.)"

I feel like you're getting my point, but I'll still add the subtlety that I'm saying "anyone who isn't biased somehow has a chance of 10^-60, and so always fails". I'm still confused by why you think that you're proposal is more realistic? Could you give me your intuition here for the uniform sampling case? Or is it just that by default you go for this model?

I wasn't trying to give a grand theory of science and invention. I'm trying to explain the specific question I quoted, about why a seemingly "bad" analogy still worked out well in this case.
I also don't know what you think the hypothesis is in contradiction with.

Contradiction with the fact that many discoveries and inventions seem to emerge in cases where the possibility space was far too large for a uniform sampling to have a chance.

I totally agree it was biased in the sense that "dissipative theory" is a lot simpler than "on Sundays, my experiments do whatever Abraham Lincoln would have predicted would happen; on other days it's whatever George Washington would have predicted", and so people investigated the theories like the former much more than theories like the latter.

I agree with that, but I meant more that dissipative theory was biased towards the truth compared to theories that would have been considered at the same level.

I expect to see this result in a random sampling world; why don't you? It seems like you just have to wait for the same random sample to be drawn again; not drawing that sample in 25 years seems totally normal.

When I look at my confusion here, it's because the point I was making is that in 25 years people have rediscovered and recreated the same stuff about steam engine a lot (haven't checked deeply but would be willing to bet), whereas they hadn't found Sadi's result again. Which to me is clear evidence that the sampling, if random, was not uniform at all. Does that answer your question, or am I missing you point completely?

[-]Rohin Shah4y00

Could you give me your intuition here for the uniform sampling case?

A bad analogy led to a good theory. This seems more probable under theories that involve luck than theories that involve skill. Hence, 1000 people with 1/1000 probability theories, rather than 2 people with 1/2 probability theories. Again, this is for this specific case, not for science as a whole.

I don't think the literal uniform theory is actually correct; there's still going to be differences in people's ability, so that it's more like 10,000 people with ~0 probability theories, 1000 people with 1/2000 probability theories, and 100 people with 1/200 probability theories. But the fundamental point is that I don't expect to gain much by studying the people who got it right than by studying the people who got it wrong in a plausible way (and if anything I expect you to learn more from the latter category).

Contradiction with the fact that many discoveries and inventions seem to emerge in cases where the possibility space was far too large for a uniform sampling to have a chance.

Do you agree there's no contradiction now that I've specified that it's sampling from a biased distribution of ideas that have ~1/1000 probability?

I meant more that dissipative theory was biased towards the truth compared to theories that would have been considered at the same level.

Yeah I think it's unclear why that should be true. (Assuming that by "at the same level" you mean theories that were posed by other scientists of comparable stature seeking to explain similar phenomena.)

When I look at my confusion here, it's because the point I was making is that in 25 years people have rediscovered and recreated the same stuff about steam engine a lot (haven't checked deeply but would be willing to bet), whereas they hadn't found Sadi's result again. Which to me is clear evidence that the sampling, if random, was not uniform at all.

How is it clear evidence? Imagine a "uniform random sampling" story in which we produce 10 theories of probability 1/1000 per year. Then in expectation it takes 100 years to produce the right theory, and it is entirely unsurprising that in 25 years people don't rediscover the right theory. So how are you using the observation "not rediscovered in 25 years" to update against "uniform random sampling"?

[-]dxu4y120

My take: if you are somehow going from the "real" prior probability (i.e. the figure for a true random draw from the uniform distribution on the hypothesis space, which Adam estimated in his comment as 10^-60, although I expect it could be even lower depending on exactly what hypothesis space we're talking about) all the way to 10^-3 (the 1/1000 figure you give), you are already jumping a large number of orders of magnitude, and it seems to me unjustified to assert you can only jump this many orders of magnitude, but no further. Indeed, if you can jump from 10^-60 to 10^-3, why can you not in principle jump slightly farther, and arrive at probability estimates that are non-negligible even from an everyday perspective, such as 10^-2 or even 10^-1?

And it seems to me that you must be implicitly asserting something like this, if you give the probability of a random proposed theory being successful as 1 in 1000 rather than 1 in 10^60. Where did that 1/1000 number come from? It certainly doesn't look to me like it came out of any principled estimate for how much justified Bayesian update can be wrung out of the evidence historically available, where that estimate just happened to arrive at ~570 decibels but no more; in fact it seems like that 1000 number basically was chosen to roughly match the number of hypotheses you think were plausibly put forth before the correct one showed up. If so, then this is... pretty obviously not proper procedure, in my view.

For myself, I basically find Eliezer's argument in Einstein's Speed as convincing as I did when I first read it, and for basically all the same reasons: finding the right theory and promoting it to the range where it first deserves attention but before it becomes an obvious candidate for most of the probability mass requires hitting a narrow target in update-space, and humans are not in general known for their precision. With far greater likelihood, if somebody identified the correct-in-retrospect theory, the evidence available to them at the time was sufficient from a Bayesian perspective to massively overdetermine that theory's correctness, and it was only their non-superintelligence that caused them to update so little and so late. Hitting a narrow range is implausible; overshooting that range, on the other hand, significantly less so.

At this point you may protest that the 1/1000 probability you give is not meant as an estimate for the actual probability a Bayes-optimal predictor would assign after updating on the evidence; instead it's whatever probability is justified for a human to assign, knowing that they are likely missing much of the picture, and that this probability is bounded from above at 10^-3 or thereabouts, at least for the kind of hard scientific problems the OP is discussing.

To be blunt: I find this completely unpersuasive. Even ignoring the obvious question from before (why 10^-3?), I can see no a priori reason why someone could not find themselves in an epistemic state where (from the inside at least) the evidence they have implies a much higher probability of correctness. From this epistemic state they might then find themselves producing statements like

I believe myself to be writing a book on economic theory which will largely revolutionize—not I suppose, at once but in the course of the next ten years—the way the world thinks about its economic problems. I can’t expect you, or anyone else, to believe this at the present stage. But for myself I don’t merely hope what I say—in my own mind, I’m quite sure.
—John Maynard Keynes

statements which, if you insist on maintaining that 10^-3 upper bound (and why so, at this point?), certainly become much harder to explain without resorting to some featureless "overconfidence" thingy; and that has been discussed in detail.

[-]Rohin Shah4y*40

Again, I'm not claiming that this is true in general. I think it is plausible to reach, idk, 90%, maybe higher, that a specific idea will revolutionize the world, even before getting any feedback from anyone else or running experiments in the world. (So I feel totally fine with the statement from Keynes that you quoted.)

I would feel very differently about this specific case if there was an actual statement from Sadi of the form "I believe that this particular theorem is going to revolutionize thermodynamics" (and he didn't make similar statements about other things that were not revolutionary).

it seems like that 1000 number basically was chosen to roughly match the number of hypotheses you think were plausibly put forth before the correct one showed up. If so, then this is... pretty obviously not proper procedure, in my view.

I totally agree that's what I did, but it seems like a perfectly fine procedure. Idk where the disconnect is, but maybe you're thinking of "1000" as coming from a weirdly opinionated prior, rather than from my posterior.

From my perspective, I start out having basically no idea what the "justifiable prior" on that hypothesis is. (If you want, you could imagine that my prior on the "justifiable prior" was uniform over log-10 odds of -60 to 10; my prior is more opinionated than that but the extra opinions don't matter much.) Then, I observe that the hypothesis we got seems to be kinda ad hoc with no great story even in hindsight for why it worked while other hypotheses didn't. My guess is then that it was about as probable (in foresight) as the other hypotheses around at the time, and combined with the number of hypotheses (~1000) and the observation that one of them worked, you get the probability of 1/1000.

(I guess a priori you could have imagined that hypotheses should either have probability approximately 10^-60 or approximately 1, since you already have all the bits you need to deduce the answer, but it seems like in practice even the most competent people frequently try hypotheses that end up being wrong / unimportant, so that can't be correct.)

As a different example, consider machine learning. Suppose you tell me that <influential researcher> has a new idea for RL sample efficiency they haven't tested, and you want me to tell you the probability it would lead to a 5x improvement in sample efficiency on Atari. It seems like the obvious approach to estimate this probability is to draw the graph of how much sample efficiency improved from previous ideas from that researcher (and other similar researchers, to increase sample size), and use that to estimate P(effect size > 5x | published), and then apply an ad hoc correction for publication bias. I claim that my reasoning above is basically analogous to this reasoning.

LESSWRONG
LW

LESSWRONG
LW

30

An analogy as the midwife of thermodynamics

30

30

Introduction: Starting point

What did Sadi discover?

Lazare’s theory of machines and the analogy to heat engines

Surprising benefits of this analogy

Other epistemic curiosities

Epistemic Analysis of Thermodynamically Reversible Processes

Impossibility of perpetual motion

Caloric fluid, an interesting mistake?

Sadi’s abstraction carried over to the next paradigms

Conclusion