12 interesting things I learned studying the discovery of nature's laws

I will give a potted history of Pearl's discovery as I understand it.

In the late 70s/early 80s, people wanted to deal with uncertainty in logic-based AI. The obvious thing to use is probability, but doing a Bayesian update to compute a posterior is exponentially expensive.

Pearl wanted to come up with a good data structure for doing computations over probability distributions in less-than-exponential time.

He introduced the idea of Bayesian networks in his paper Reverend Bayes On Inference Engines where he represents factorized probability distributions using DAGs. Here, the direction of the arrows is arbitrary and there are many DAGs corresponding to one probability distribution.

He was not thinking about causality at all, it was just a problem in data structures. The idea was this would be used for the same sort of thing as an "expert system" or other logic based AI systems, but taking into account uncertainty expressed probabilistically.

Later, people including Pearl noticed that you can and often should interpret the arrows as causal, this amounts to choosing one DAG from many. The fact that there are many possible DAGs is related to the fact that there are seemingly always multiple incompatible causal stories, to explain observations absent making additional assumptions about the world. But if you pick one, you can start using it to see whether your causal question can be answered from observational data alone.

Finally, he realized that the assumptions encoded in a DAG aren't sufficient for fully general counterfactuals, and realized that in full generality you have to specify exactly what functional relationship goes along each edge of the graph.

As someone originally concerned with AI, not with problems in the natural sciences, Pearl is probably unusual. Pearl himself looks back on Sewall Wright as his progenitor for coming up with path diagrams -- he was working in genetics. If you are interested in this, you should also look at Don Rubin's experience -- his causal framework is isomorphic to Pearl's. He was a 100 percent classic statistician, motivated by looking at medical studies.

[-]Jacy Reese Anthis4y80

I think another important part of Pearl's journey was that during his transition from Bayesian networks to causal inference, he was very frustrated with the correlational turn in early 1900s statistics. Because causality is so philosophically fraught and often intractable, statisticians shifted to regressions and other acausal models. Pearl sees that as throwing out the baby (important causal questions and answers) with the bathwater (messy empirics and a lack of mathematical language for causality, which is why he coined the do operator).

Pearl discusses this at length in The Book of Why, particularly the Chapter 2 sections on "Galton and the Abandoned Quest" and "Pearson: The Wrath of the Zealot." My guess is that Pearl's frustration with statisticians' focus on correlation was immediate upon getting to know the field, but I don't think he's publicly said how his frustration began.

[-]Alexander Gietelink Oldenziel4y50

Is Rubin's work actually the same as Pearl's??

Please tell more?

That's not the impression from reading Pearl s causality. If so, seems like a major omission of scholarship

[-]Anonymous4y120

Rubin's framework says basically, suppose all our observations are in a big data table. Now consider the counterfactual observations that didn't happen (i.e. people in the control group getting the treatment) -- these are called "potential outcomes" -- treat those like missing cells in the data table. Then causal inference is just to fill in potential outcomes using missing data imputation techniques, although to be valid these require some assumptions about conditional independence.

Pearl's framework and Rubin's are isomorphic in the sense that any set of causal assumptions in Pearl's framework (a structural causal model, which has a DAG structure), can be translated into a set of causal assumptions in Rubin's framework (a bunch of conditional independence assumptions about potential outcomes), and vice versa. This is touched on somewhat in Ch. 7 of "Causality".

Pearl argues that despite this equivalence, his framework is superior because it's a better tool for thinking. In other words, writing down your assumptions as DAG/SCM is intuitive and can be explained and argued about, while he claims the Rubin model independence assumptions are opaque and hard to understand.

[-]IlyaShpitser4y170

Some reading on this:

https://csss.uw.edu/files/working-papers/2013/wp128.pdf

http://proceedings.mlr.press/v89/malinsky19b/malinsky19b.pdf

https://arxiv.org/pdf/2008.06017.pdf

---

From my experience it pays to learn how to think about causal inference like Pearl (graphs, structural equations), and also how to think about causal inference like Rubin (random variables, missing data). Some insights only arise from a synthesis of those two views.

Pearl is a giant in the field, but it is worth remembering that he's unusual in another way (compared to a typical causal inference researcher) -- he generally doesn't worry about actually analyzing data.

---

By the way, Gauss figured out not only the normal distribution trying to track down Ceres' orbit, he actually developed the least squares method, too! So arguably the entire loss minimization framework in machine learning came about from thinking about celestial bodies.

[-]Alexander Gietelink Oldenziel4y10

Aha, I will have to ponder on this for a while. Thanks a lot!

[-]LoganStrohl4y360

How sweet of you to write me this love letter.

[-]Ben Pace4y80

[-]drossbucket4y180

One confusion I wrote down in advance was “I still don’t quite know how to predict that there will not be a simple mathematical apparatus that explains something. Why the motion of the planets, why the game of chance, why not the color of houses in England or the number of hairs on a man’s head?"

I think the main thing I'd look for is an unusual amount of regularity. This comes in two types:

Natural regularity: unusual 'spherical cow' type situations like the movement of the planets. Things that are somehow isolated, or where some particular effect strongly dominates, so that only a few variables are needed
Artificial regularity: a lot of the regularity we see around us is there because people engineered it. Dice and coins are good examples. Can't remember details but I think there's some interesting stuff on the history of dice, e.g. this link says that 'Only in the middle of the 15th century did it become standard to use symmetric cubes'. I think it would be hard to invent probability theory when gambling with irregularly shaped lumps.

There doesn't seem to be any particularly obvious regularity to house colours or number of hairs, they just look like your standard-issue messy situations that don't tell you much.

[-]Ben Pace4y90

Only in the middle of the 15th century did it become standard to use symmetric cubes

Haha! Those poor people. All of my intuitions about probabilities would have been terribly broken in those times.

[-]Alexander Gietelink Oldenziel4y*30

This is exactly the question that John Wentworth is trying to answer with his abstraction hypothesis framework. Also related to Jaynes proof that probability of a fair coin coming up heads is 1/2.

As to being able to discern between different theories. Partly you are right that it can be hard during a scientific controversy and it involves a lot of judgement calls. On the other hand, it can be hard for layman to appreciate how 'rigid' good mathematical models are. Newton didn't just observe that apples fall to the ground but he posited a series of elegant laws and was able to calculate very nonobvious results. The entire theory is quite large and intricate - and there are many quantitive tests one can do and that have been done.

[-]Charlie Steiner4y170

One of the key things to figure out is why scientists working in the field can make confident pronouncements like "oh, the Jupiter thing is just light moving slower" or "no, we swear there's going to be a Higgs boson, we just need to build a more powerful particle accelerator" and have them actually working out. It's a mathematical impossibility that the models of the world they use have lots of knobs they could turn and they have just hit the right setting of the knobs by accident many times (even with plenty of wrong turns along the way as well). And so clearly there is implicit knowledge, not at all obvious to the outside who just hears a one-sentence synopsis of this idea without having to attend any symposia on it or read a half dozen research papers about why it makes sense.

I mean, put like that, it seems like it has an obvious answer. And I think the obvious answer is mostly right, though there can be some interesting wrinkles in it.

[-]Steven Byrnes4y120

Internal consistency of the theory, consistency with other known things, retrodiction of known observations, simplicity of the theory (as compared to its explanatory power), revealing and resolving unsatisfactory aspects of alternative theories…

I agree that those kinds of things are probably “not obvious from a one-sentence synopsis” but I don't see why they have to be “implicit” or require reading lots of research papers.

[-]Ben Pace4y40

I'm not sure what the right answer is.

I'd be interested to know how many different people around the world came up with explanations and empirically tested them. I don't know whether people "got the answer right first time" or "lots of people threw lots of hypotheses at the walls and these are the ones that stuck".

[-]Capybasilisk4y40

“no, we swear there’s going to be a Higgs boson, we just need to build a more powerful particle accelerator”

Particle physicists also made other confident predictions about the LHC that are not working out, and they're now asking for a bigger accelerator.

Survivorship bias might be at play, wherein we forget all the confident pronouncements that ended being just plain wrong.

[-]Charlie Steiner4y50

I mean, the other main things to look for were WIMPs and supersymmetry, but almost everyone was cautious about chances of finding those.

https://www.preposterousuniverse.com/blog/2008/08/04/what-will-the-lhc-find/

[-]Ege Erdil4y160

My current favorite story of scientific discovery is probably the origin of Bose-Einstein statistics.

Before the discovery of Planck's law, there was the problem of ultraviolet catastrophe in applying statistical mechanics to fields: there are many more high frequency modes than low frequency ones, and the equipartition theorem of statistical mechanics predicts that energy should be spread out evenly across all quadratic degrees of freedom. There's ~ one quadratic degree of freedom for each frequency in a free field, so a naive application led to the Rayleigh-Jeans law for blackbody radiation which predicted an infinite energy flux radiated by a blackbody at nonzero temperature.

People waved this off as statistical mechanics not being applicable to this situation. Then, Planck noticed that if energy comes in discrete packets where the size of each packet scales linearly with frequency, this manages to kill the divergence at high frequencies and give reasonable results for the spectral energy density of blackbody radiation. This is now known as Planck's law.

Some years later, when Bose was giving a talk about the ultraviolet catastrophe problem to an audience and explaining why Planck's calculation was actually unjustified under Maxwell-Boltzmann statistics, he made an error in a combinatorial argument and accidentally derived that Planck's argument was justified. He realized later that the "error" he had made was to assume that photons occupying the same energy level were indistinguishable. Since photons are bosons, this is actually correct, but the discovery was actually made through a calculation mistake.

Bose later submitted this paper to an English journal & got rejected, so he got in touch with Einstein and asked him to translate his article to German so that it could be published in a German journal. Einstein agreed, and that's where the name of "Bose-Einstein statistics" comes from.

[-]Bucky4y120

One rich dude had a whole island and set it up to have lenses on lots of parts of it, and for like a year he’d go around each day and note down the positions of the stars

You can’t just say that without a name or reference! Not that I don’t believe you - I just want to know more!

[-]Ben Pace4y140

That man's name was Tycho Brahe.

[-]DirectedEvolution4y120

As a small note, it would be easier to navigate this post if each section had a brief heading.

[-]ryan_b4y110

Re: Feynman's quote: why in the dickens aren't large outstanding problems summarized this way? This seems like a great way to generate angles of attack, in Hamming's sense of the term. It feels intuitively like being able to describe why a given approach from this list wouldn't work would by itself be substantial progress on a given problem.

[-]Dweomite4y80

Hm. I'm reminded of my college class on Complexity Theory, where the professor explained some common strategies that have been widely successful in proving that two complexity classes either are or aren't the same, and then went on to prove that those strategies could not be used to solve P vs NP.

That gave me a whole new appreciation for the difficulty of the problem, and how hard people have worked on it.

[-]Kaj_Sotala4y110

My update is further in the direction that Jacob’s post The Copernican Revolution from the Inside argues for, which is that if two different people had different theories at the time, I do not anticipate the disagreement being able to be “clearly resolvable” at all, and do expect for it to involve a great number of judgment calls, in large part dependent on one’s “philosophy” of how to make those calls in this domain.

... there is rarely a single experiment that one paradigm fails and another passes. Rather, there are dozens of experiments. One paradigm does better on some, the other paradigm does better on others, and everyone argues over which ones should or shouldn’t count.
For example, one might try to test the Copernican vs. Ptolemaic worldviews by observing the parallax of the fixed stars over the course of a year. Copernicus predicts it should be visible; Ptolemy predicts it shouldn’t be. It isn’t, which means either the Earth is fixed and unmoving, or the stars are unutterably unimaginably immensely impossibly far away. Nobody expected the stars to be that far away, so advantage Ptolemy. Meanwhile, the Copernicans posit far-off stars in order to save their paradigm. What looked like a test to select one paradigm or the other has turned into a wedge pushing the two paradigms even further apart.
What looks like a decisive victory to one side may look like random noise to another. Did you know weird technologically advanced artifacts are sometimes found encased in rocks that our current understanding of geology says are millions of years old? Creationists have no trouble explaining those – the rocks are much younger, and the artifacts were probably planted by nephilim. Evolutionists have no idea how to explain those, and default to things like “the artifacts are hoaxes” or “the miners were really careless and a screw slipped from their pocket into the rock vein while they were mining”. I’m an evolutionist and I agree the artifacts are probably hoaxes or mistakes, even when there is no particular evidence that they are. Meanwhile, probably creationists say that some fossil or other incompatible with creationism is a hoax or a mistake. But that means the “find something predicted by one paradigm but not the other, and then the failed theory comes crashing down” oversimplification doesn’t work. Find something predicted by one paradigm but not the other, and often the proponents of the disadvantaged paradigm can – and should – just shrug and say “whatever”.
In 1870, flat-earther Samuel Rowbotham performed a series of experiments to show the Earth could not be a globe. In the most famous, he placed several flags miles apart along a perfectly straight canal. Then he looked through a telescope and was able to see all of them in a row, even though the furthest should have been hidden by the Earth’s curvature. Having done so, he concluded the Earth was flat, and the spherical-earth paradigm debunked. Alfred Wallace (more famous for pre-empting Darwin on evolution) took up the challenge, and showed that the bending of light rays by atmospheric refraction explained Rowbotham’s result. It turns out that light rays curve downward at a rate equal to the curvature of the Earth’s surface! Luckily for Wallace, refraction was already a known phenomenon; if not, it would have been the same kind of wedge-between-paradigms as the Copernicans having to change the distance to the fixed stars.
It is all nice and well to say “Sure, it looks like your paradigm is right, but once we adjust for this new idea about the distance to the stars / the refraction of light, the evidence actually supports my paradigm”. But the supporters of old paradigms can do that too! The Ptolemaics are rightly mocked for adding epicycle after epicycle until their system gave the right result. But to a hostile observer, positing refraction effects that exactly counterbalance the curvature of the Earth sure looks like adding epicycles. At some point a new paradigm will win out, and its “epicycles” will look like perfectly reasonable adjustments for reality’s surprising amount of detail. And the old paradigm will lose, and its “epicycles” will look like obvious kludges to cover up that it never really worked. Before that happens…well, good luck.

[-]Ben Pace4y30

Yeah yeah! I've heard this before and all. Somehow I felt it be more part of me this week.

[-]Rob Bensinger4y90

One confusion I wrote down in advance was “I still don’t quite know how to predict that there will not be a simple mathematical apparatus that explains something. Why the motion of the planets, why the game of chance, why not the color of houses in England or the number of hairs on a man’s head?"

It seems very hard to say a priori that there won't be any interesting new abstract structure discovered by looking at some new domain, especially when Science is young and you don't know the base rate of 'how often do we discover useful new formalisms?'. E.g., Fibonacci numbers and Lucas numbers show up in the distribution of petals for many flowers; hair could have turned out to reveal something similar.

I think the correct process for zeroing in on relatively promising domains is something like:

First, try to come up with the simplest accounts for everything, unifying as many different phenomena as possible. (Reasoning: these are easier to generate and think about, and there are a lot fewer simple stories to evaluate than complex ones, and simple stories can often serve as useful approximations for more complex ones.)
Go look at really weird / really different domains. (Because if you can't find a simple account to explain 'normal' stuff, there might still be simple accounts that explain the weird stuff, because weird stuff is weird so Many Things Are Possible. And if you do have simple accounts to explain the normal stuff, you should check whether weird stuff violates those generalizations.)

Planets are a weird domain — there aren't a bunch of things we knew about in the 17th century that were similar to planets, or that formed a continuum between planets and ordinary objects like lanterns and pigeons. In contrast, hairs are a lot like whiskers, feathers, etc.; and house colors are a lot like cave colors, tent colors, etc. So if there are surprising new generalizations to find, they're more likely to crop up by studying planets than by studying hair or house colors.

Similarly, gambling is weird relative to non-probabilistic inference. If you're really into Aristotle and you're trying to model all human reasoning and decision-making using deductive syllogisms, you should be really curious about the domains where people do weird things like 'bet things based on guesswork, with no certainty they're right'. (You might similarly take an interest in dreams, emotions, divine inspiration, self-deception, bullshit, etc.; they won't all be winners, but an occasional winner is sufficient.)

[-]habryka4y60

Promoted to curated: I think understanding how our understanding of nature has historically progressed is quite important for understanding how to structure research fields and research methodologies, and this post covers a bunch of datapoints that seemed pretty informative in that space.

[-]Ben Pace4y20

Yay, thanks!

[-]Richard Nole4y50

Hello Ben, I'm interested in studying this kind of history too. Can you list the books you're reading to study this? I find that when I study the context of discoveries and how they were developing over time helps me understand them better

[-]Ben Pace4y50

I primarily watched YouTube a couple of hours a day for 4 days. YouTube has lots of explainers and more, including great little homemade videos with like 500 views.

Plus a very little Wikipedia and this great site.

[-]LoganStrohl2y40Review for 2022 Review

I have thought fondly of this post several times since I read it.

[-]Beckeck4y40

in lieu of writing nothing instead, informally -
hey, good list! i wonder if you've read much of the recent history of sabermetrics, which to me is the modern equivalent (in that it's a history of bunch of nerds and some people who wanted to be rich who actualized statistical modeling at the frontier of the applied science)?

[-]Ben Pace4y20

I just learned the word, thanks for the pointer. Seems like a solid place to look more.

[-]Beckeck4y40

some places to look (with hope that others might add theirs):
Moneyball (the book, the movie lacks detail but gets some of the spirit)
fivethirtyeight's methodology articles on their various sports/+ models (https://fivethirtyeight.com/features/how-our-raptor-metric-works/
https://fivethirtyeight.com/features/how-fivethirtyeight-2020-primary-model-works/)
probably a bunch of articles from grantland (which is archived but available, but i lack titles off the top of my head)
https://en.wikipedia.org/wiki/Sports_analytics
zvi's sports betting articles

[-][anonymous]4y30

You might be interested in BACON:

https://users.cs.cf.ac.uk/Dave.Marshall/AI2/node152.html

It was an AI system from the 80's which was able to infer physical laws from data observations. It correctly inferred the ideal gas law from pressure/volume/temperature data, and some (all?) of Kepler's laws from ground-based planetary observations.

[-]Capybasilisk4y20

Going forward, I think discovery in the natural sciences will entirely be about automated searches in equation-space for models that fit datasets generated by real-world systems.

Why does one model work and not the other? Hopefully we'll know, most likely we won't. At any rate, the era of a human genius working these things out with pen and paper is pretty much over (Just consider the amount of combined intellectual power now needed to make incremental improvements. Major scientific papers these days will usually have a dozen+ names from several institutions).

Ultimately, this process will look like pointing a camera at the world in general and using the resulting raw bit stream to induce the fundamental program that runs the Universe.

[-]Ben Pace4y90

Going forward, I think discovery in the natural sciences will entirely be about automated searches in equation-space for models that fit datasets generated by real-world systems.

Wow! Sounds like you should be able to exploit this knowledge for a lot of prestige and scientific discovery :)

[-]mocny-chlapik4y10

You might enjoy reading _The Structure of Scientific Revolutions_. #9 is explicitly discussed there. It is often a case when the old incorrect theory has a lot of work in it and many of the anomalies are explained by additional mechanism, e.g. the geocentric theory had a lot of bells and whistles in the end and it was quite precise in some cases. When the heliocentric theory was created, it was actually worse at predicting the movement of celestial bodies because it was too simplistic and was not able to handle various edge cases. Related to your remark about gravity, it took more than 50 years to successfully apply the theory of gravity to predict how Moon will behave.

[-]Richard Zander4y10

Math is not physics. I'm not sure what math is. I kind of like Gisin's support of intuitive math. I agree that the next billion digits of pi mean nothing real, also that there should be some constructivist dimension to the infinities in math (e.g. renormalization).

[-]Richard Zander4y10

Oh, and statistics is not math, it's physics. You can test the results of statistics against the real world, but math is merely consistent.

[-]tailcalled4y10

Pearl himself says that he has discovered two laws, and once you have them, you can fire him, because the rest is just algebra! And he calls it a calculus of counterfactuals, just like Newton and Bayes and everyone did. Fascinating.
I couldn’t find anything on what problems Pearl was thinking about when he came up with his calculus of counterfactuals. Like, was he personally trying to analyze clinical trials? Was he a mathematician who was friends with people doing large experiments and thought the math was interesting? I want to know what part of the world he was in contact with when developing it.

I don't know much about the history, but the fact that Pearl was a computer scientist must surely have mattered a lot. His causality math essentially treats the laws of physics as being a "symbolic" program, which given some input generates the resulting variables of the world.

^{^}

Feynman, on the art of guessing nature’s laws, in his final lecture for BBC's Messenger Lectures:

“Or look at history, you first start out with Newton: he [was] in a situation where he had incomplete knowledge, and he was able to get the laws by putting together ideas which all were relatively close to experiment—there wasn’t a great distance between the observations and the test.”

“Now, the next guy who did something—another man who did something great—was Maxwell, who obtained the laws of electricity and magnetism. But what he did was this, he put together all the laws of electricity due to Faraday and other people that came before him, and he looked at them and he realized that they were mutually inconsistent; they were mathematically inconsistent. In order to straighten it out he had to add one term to an equation.”

“By the way, he did this by inventing a model for himself of idler wheels, and gears, and so on, in space. Then he found what the new law was, and nobody paid much attention, because they didn’t believe in the idler wheels. We don’t believe in the idler wheels today, but the equations that he obtained were correct. So the logic may be wrong, but the answer is all right.”

“In the case of relativity, the discovery of relativity was completely different: there was an accumulation of paradoxes; the known laws gave inconsistent results, and it was a new kind of thinking, a thinking in terms of discussing the possible symmetries of laws. It was especially difficult because it was for the first time realized how long something like Newton’s laws could be right—and still ultimately be wrong—and, second, that ordinary ideas of time and space that seem so instinctive could be wrong.”

“Quantum mechanics was discovered in two independent ways, which is a lesson. There, again, and even more so, an enormous number of paradoxes were discovered experimentally, things that absolutely couldn’t be explained in any way by what was known—not that the knowledge was incomplete, but the knowledge was too complete!: your prediction was, this should happen; it didn’t.

The two different routes were: one, by Schrodinger, who guessed the equations; another, by Heisenberg, who argued that you must analyze what’s measurable. So two different philosophical methods reduced to the same discovery in the end.”

“More recently, the discovery of the laws of this [weak decay] interaction, which are still only partly known, add quite a somewhat different situation: this time it was a case of incomplete knowledge, and only the equation was guessed. The special difficulty this time was that the experiments were all wrong—all the experiments were wrong.”

“Now, how can you guess the right answer when, when you calculate the results it disagrees with the experiment, and you have the courage to say the experiments must be wrong. I’ll explain where the courage comes from in a minute.”

“Now, I’m sure that history does not repeat itself in physics, as you see from this list, and the reason is this: any scheme—like, "Think of symmetry laws," or "Put the equations in mathematical form," or any of these schemes "Guess equations," and so on—are known to everybody now, and they’re tried all the time. So if the place where you get stuck is not that—and you try that right away: we try looking for symmetries; we try all the things that have been tried before, but we’re stuck-so it must be another way next time.

Each time that we get in this log jam of too many problems, it’s because the methods that we’re using are just like the ones we used before. We try all that right away, but the new discovery is going to be made in a completely different way—so history doesn’t help us very much.”

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

271

12 interesting things I learned studying the discovery of nature's laws

271

271