Building Intuitions On Non-Empirical Arguments In Science

by Scott AlexanderSSC11 min read7th Nov 201928 comments


FalsifiabilityIntuitionPractice & Philosophy of Science


Aeon: Post-Empirical Science Is An Oxymoron And It is Dangerous:

There is no agreed criterion to distinguish science from pseudoscience, or just plain ordinary bullshit, opening the door to all manner of metaphysics masquerading as science. This is ‘post-empirical’ science, where truth no longer matters, and it is potentially very dangerous.

It’s not difficult to find recent examples. On 8 June 2019, the front cover of New Scientist magazine boldly declared that we’re ‘Inside the Mirrorverse’. Its editors bid us ‘Welcome to the parallel reality that’s hiding in plain sight’. […]

[Some physicists] claim that neutrons [are] flitting between parallel universes. They admit that the chances of proving this are ‘low’, or even ‘zero’, but it doesn’t really matter. When it comes to grabbing attention, inviting that all-important click, or purchase, speculative metaphysics wins hands down.

These theories are based on the notion that our Universe is not unique, that there exists a large number of other universes that somehow sit alongside or parallel to our own. For example, in the so-called Many-Worlds interpretation of quantum mechanics, there are universes containing our parallel selves, identical to us but for their different experiences of quantum physics. These theories are attractive to some few theoretical physicists and philosophers, but there is absolutely no empirical evidence for them. And, as it seems we can’t ever experience these other universes, there will never be any evidence for them. As Broussard explained, these theories are sufficiently slippery to duck any kind of challenge that experimentalists might try to throw at them, and there’s always someone happy to keep the idea alive.

Is this really science? The answer depends on what you think society needs from science. In our post-truth age of casual lies, fake news and alternative facts, society is under extraordinary pressure from those pushing potentially dangerous antiscientific propaganda – ranging from climate-change denial to the anti-vaxxer movement to homeopathic medicines. I, for one, prefer a science that is rational and based on evidence, a science that is concerned with theories and empirical facts, a science that promotes the search for truth, no matter how transient or contingent. I prefer a science that does not readily admit theories so vague and slippery that empirical tests are either impossible or they mean absolutely nothing at all.

As always, a single quote doesn’t do the argument justice, so go read the article. But I think this captures the basic argument: multiverse theories are bad, because they’re untestable, and untestable science is pseudoscience.

Many great people, both philosophers of science and practicing scientists, have already discussed the problems with this point of view. But none of them lay out their argument in quite the way that makes the most sense to me. I want to do that here, without claiming any originality or special expertise in the subject, to see if it helps convince anyone else.


Consider a classic example: modern paleontology does a good job at predicting dinosaur fossils. But the creationist explanation – Satan buried fake dinosaur fossils to mislead us – also predicts the same fossils (we assume Satan is good at disguising his existence, so that the lack of other strong evidence for Satan doesn’t contradict the theory). What principles help us realize that the Satan hypothesis is obviously stupid and the usual paleontological one more plausible?

One bad response: paleontology can better predict characteristics of dinosaur fossils, using arguments like “since plesiosaurs are aquatic, they will be found in areas that were underwater during the Mesozoic, but since tyrannosaurs are terrestrial, they will be found in areas that were on land”, and this makes it better than the Satan hypothesis, which can only retrodict these characteristics. But this isn’t quite true: since Satan is trying to fool us into believing the modern paleontology paradigm, he’ll hide the fossils in ways that conform to its predictions, so we will predict plesiosaur fossils will only be found at sea – otherwise the gig would be up!

A second bad response: “The hypothesis that all our findings were planted to deceive us bleeds into conspiracy theories and touches on the problem of skepticism. These things are inherently outside the realm of science.” But archaeological findings are very often deliberate hoaxes planted to deceive archaeologists, and in practice archaeologists consider and test that hypothesis the same way they consider and test every other hypothesis. Rule this out by fiat and we have to accept Piltdown Man, or at least claim that the people arguing against the veracity of Piltdown Man were doing something other than Science.

A third bad response: “Satan is supernatural and science is not allowed to consider supernatural explanations.” Fine then, replace Satan with an alien. I think this is a stupid distinction – if demons really did interfere in earthly affairs, then we could investigate their actions using the same methods we use to investigate every other process. But this would take a long time to argue well, so for now let’s just stick with the alien.

A fourth bad response: “There is no empirical test that distinguishes the Satan hypothesis from the paleontology hypothesis, therefore the Satan hypothesis is inherently unfalsifiable and therefore pseudoscientific.” But this can’t be right. After all, there’s no empirical test that distinguishes the paleontology hypothesis from the Satan hypothesis! If we call one of them pseudoscience based on their inseparability, we have to call the other one pseudoscience too!

A naive Popperian (which maybe nobody really is) would have to stop here, and say that we predict dinosaur fossils will have such-and-such characteristics, but that questions like that process that drives this pattern – a long-dead ecosystem of actual dinosaurs, or the Devil planting dinosaur bones to deceive us – is a mystical question beyond the ability of Science to even conceivably solve.

I think the correct response is to say that both theories explain the data, and one cannot empirically test which theory is true, but the paleontology theory is more elegant (I am tempted to say “simpler”, but that might imply I have a rigorous mathematical definition of the form of simplicity involved, which I don’t). It requires fewer other weird things to be true. It involves fewer other hidden variables. It transforms our worldview less. It gets a cleaner shave with Occam’s Razor. This elegance is so important to us that it explains our vast preference for the first theory over the second.

A long tradition of philosophers of science have already written eloquently about this, summed up by Sean Carroll here:

What makes an explanation “the best.” Thomas Kuhn ,after his influential book The Structure of Scientific Revolutions led many people to think of him as a relativist when it came to scientific claims, attempted to correct this misimpression by offering a list of criteria that scientists use in practice to judge one theory better than another one: accuracy, consistency, broad scope, simplicity, and fruitfulness. “Accuracy” (fitting the data) is one of these criteria, but by no means the sole one. Any working scientist can think of cases where each of these concepts has been invoked in favor of one theory or another. But there is no unambiguous algorithm according to which we can feed in these criteria, a list of theories, and a set of data, and expect the best theory to pop out. The way in which we judge scientific theories is inescapably reflective, messy, and human. That’s the reality of how science is actually done; it’s a matter of judgment, not of drawing bright lines between truth and falsity or science and non-science. Fortunately, in typical cases the accumulation of evidence eventually leaves only one viable theory in the eyes of most reasonable observers.

The dinosaur hypothesis and the Satan hypothesis both fit the data, but the dinosaur hypothesis wins hands-down on simplicity. As Carroll predicts, most reasonable observers are able to converge on the same solution here, despite the philosophical complexity.


I’m starting with this extreme case because its very extremity makes it easier to see the mechanism in action. But I think the same process applies to other cases that people really worry about.

Consider the riddle of the Sphinx. There’s pretty good archaeological evidence supporting the consensus position that it was built by Pharaoh Khafre. But there are a few holes in that story, and a few scattered artifacts suggest it was actually built by Pharaoh Khufu; a respectable minority of archaeologists believe this. And there are a few anomalies which, if taken wildly out of context, you can use to tell a story that it was built long before Egypt existed at all, maybe by Atlantis or aliens.

So there are three competing hypotheses. All of them are consistent with current evidence (even the Atlantis one, which was written after the current evidence was found and carefully adds enough epicycles not to blatantly contradict it). Perhaps one day evidence will come to light that supports one above the others; maybe in some unexcavated tomb, a hieroglyphic tablet says “I created the Sphinx, sincerely yours, Pharaoh Khufu”. But maybe this won’t happen. Maybe we already have all the Sphinx-related evidence we’re going to get. Maybe the information necessary to distinguish among these hypotheses has been utterly lost beyond any conceivable ability to reconstruct.

I don’t want to say “No hypothesis can be tested any further, so Science is useless to us here”, because then we’re forced to conclude stupid things like “Science has no opinion on whether the Sphinx was built by Khafre or Atlanteans,” whereas I think most scientists would actually have very strong opinions on that.

But what about the question of whether the Sphinx was built by Khafre or Khufu? This is a real open question with respectable archaeologists on both sides; what can we do about it?

I think the answer would have to be: the same thing we did with the Satan vs. paleontology question, only now it’s a lot harder. We try to figure out which theory requires fewer other weird things to be true, fewer hidden variables, less transformation of our worldview – which theory works better with Occam’s Razor. This is relatively easy in the Atlantis case, and hard but potentially possible in the Khafre vs. Khufu case.

(Bayesians can rephrase this to: given that we have a certain amount of evidence for each, can we quantify exactly how much evidence, and what our priors for each should be. It would end not with a decisive victory of one or the other, but with a probability distribution, maybe 80% chance it was Khafre, 20% chance it was Khufu)

I think this is a totally legitimate thing for Egyptologists to do, even if it never results in a particular testable claim that gets tested. If you don’t think it’s a legitimate thing for Egyptologists to do, I have trouble figuring out how you can justify Egyptologists rejecting the Atlantis theory.

(Again, Bayesians would start with a very low prior for Atlantis, and assess the evidence as very low, and end up with a probability distribution something like Khafre 80%, Khufu 19.999999%, Atlantis 0.000001%)


How does this relate to things like multiverse theory? Before we get there, one more hokey example:

Suppose scientists measure the mass of one particle at 32.604 units, the mass of another related particle at 204.897 units, and the mass of a third related particle at 4452.767 units. For a while, this is just how things are – it seems to be an irreducible brute fact about the universe. Then some theorist notices that if you set the mass of the first particle as x, then the second is 2πx and the third is 4/3 πx^2. They theorize that perhaps the quantum field forms some sort of extradimensional sphere, the first particle represents a diameter of a great circle of the sphere, the second the circumference of the great circle, and the third the volume of the sphere.

(please excuse the stupidity of my example, I don’t know enough about physics to come up with something that isn’t stupid, but I hope it will illustrate my point)

In fact, imagine that there are a hundred different particles, all with different masses, and all one hundred have masses that perfectly correspond to various mathematical properties of spheres.

Is the person who made this discovery doing Science? And should we consider their theory a useful contribution to physics?

I think the answer is clearly yes. But consider what this commits us to. Suppose the scientist came up with their Extradimensional Sphere hypothesis after learning the masses of the relevant particles, and so it has not predicted anything. Suppose the extradimensional sphere is outside normal space, curled up into some dimension we can’t possibly access or test without a particle accelerator the size of the moon. Suppose there are no undiscovered particles in this set that can be tested to see if they also reflect sphere-related parameters. This theory is exactly the kind of postempirical, metaphysical construct that the Aeon article savages.

But it’s really compelling. We have a hundred different particles, and this theory retrodicts the properties of each of them perfectly. And it’s so simple – just say the word “sphere” and the rest falls out naturally! You would have to be crazy not to think it was at least pretty plausible, or that the scientist who developed it had done some good work.

Nor do I think it seems right to say “The discovery that all of our unexplained variables perfectly match the parameters of a sphere is good, but the hypothesis that there really is a sphere is outside the bounds of Science.” That sounds too much like saying “It’s fine to say dinosaur bones have such-and-such characteristics, but we must never speculate about what kind of process produced them, or whether it involved actual dinosaurs”.


My understanding of the multiverse debate is that it works the same way. Scientists observe the behavior of particles, and find that a multiverse explains that behavior more simply and elegantly than not-a-multiverse.

One (doubtless exaggerated) way I’ve heard multiverse proponents explain their position is like this: in certain situations the math declares two contradictory answers – in the classic example, Schrodinger’s cat will be both alive and dead. But when we open the box, we see only a dead cat or an alive cat, not both. Multiverse opponents say “Some unknown force steps in at the last second and destroys one of the possibility branches”. Multiverse proponents say “No it doesn’t, both possibility branches happen exactly the way the math says, and we end up in one of them.”

Taking this exaggerated dumbed-down account as exactly right, this sounds about as hard as the dinosaurs-vs-Satan example, in terms of figuring out which is more Occam’s Razor compliant. I’m sure the reality is more nuanced, but I think it can be judged by the same process. Perhaps this is the kind of reasoning that only gets us to a 90% probability there is a multiverse, rather than a 99.999999% one. But I think determining that theories have 90% probability is a reasonable scientific thing to do.


At times, the Aeon article seems to flirt with admitting that something like this is necessary:

Such problems were judged by philosophers of science to be insurmountable, and Popper’s falsifiability criterion was abandoned (though, curiously, it still lives on in the minds of many practising scientists). But rather than seek an alternative, in 1983 the philosopher Larry Laudan declared that the demarcation problem is actually intractable, and must therefore be a pseudo-problem. He argued that the real distinction is between knowledge that is reliable or unreliable, irrespective of its provenance, and claimed that terms such as ‘pseudoscience’ and ‘unscientific’ have no real meaning.

But it always jumps back from the precipice:

So, if we can’t make use of falsifiability, what do we use instead? I don’t think we have any real alternative but to adopt what I might call the empirical criterion. Demarcation is not some kind of binary yes-or-no, right-or-wrong, black-or-white judgment. We have to admit shades of grey. Popper himself was ready to accept this, [saying]:

“The criterion of demarcation cannot be an absolutely sharp one but will itself have degrees. There will be well-testable theories, hardly testable theories, and non-testable theories. Those which are non-testable are of no interest to empirical scientists. They may be described as metaphysical.”

Here, ‘testability’ implies only that a theory either makes contact, or holds some promise of making contact, with empirical evidence. It makes no presumptions about what we might do in light of the evidence. If the evidence verifies the theory, that’s great – we celebrate and start looking for another test. If the evidence fails to support the theory, then we might ponder for a while or tinker with the auxiliary assumptions. Either way, there’s a tension between the metaphysical content of the theory and the empirical data – a tension between the ideas and the facts – which prevents the metaphysics from getting completely out of hand. In this way, the metaphysics is tamed or ‘naturalised’, and we have something to work with. This is science.

But as we’ve seen, many things we really want to include as science are not testable: our credence for real dinosaurs over Satan planting fossils, our credence for Khafre building the Sphinx over Khufu or Atlanteans, or elegant patterns that explain the features of the universe like the Extradimensional-Sphere Theory.

The Aeon article is aware of Carroll’s work – which, along with the paragraph quoted in Section II above, includes a lot of detailed Bayesian reasoning encompassing everything I’ve discussed. But the article dismisses it in a few sentences:

Sean Carroll, a vocal advocate for the Many-Worlds interpretation, prefers abduction, or what he calls ‘inference to the best explanation’, which leaves us with theories that are merely ‘parsimonious’, a matter of judgment, and ‘still might reasonably be true’. But whose judgment? In the absence of facts, what constitutes ‘the best explanation’?

Carroll seeks to dress his notion of inference in the cloth of respectability provided by something called Bayesian probability theory, happily overlooking its entirely subjective nature. It’s a short step from here to the theorist-turned-philosopher Richard Dawid’s efforts to justify the string theory programme in terms of ‘theoretically confirmed theory’ and ‘non-empirical theory assessment’. The ‘best explanation’ is then based on a choice between purely metaphysical constructs, without reference to empirical evidence, based on the application of a probability theory that can be readily engineered to suit personal prejudices.

“A choice between purely metaphysical constructs, without reference to empirical evidence” sounds pretty bad, until you realize he’s talking about the same reasoning we use to determine that real dinosaurs are more likely than Satan planting fossils.

I don’t want to go over the exact ways in which Bayesian methods are subjective (which I think are overestimated) vs. objective. I think it’s more fruitful to point out that your brain is already using Bayesian methods to interpret the photons striking your eyes into this sentence, to make snap decisions about what sense the words are used in, and to integrate them into your model of the world. If Bayesian methods are good enough to give you every single piece of evidence about the nature of the external world that you have ever encountered in your entire life, I say they’re good enough for science.

Or if you don’t like that, you can use the explanation above, which barely uses the word “Bayes” at all and just describes everything in terms like “Occam’s Razor” and “you wouldn’t want to conclude something like that, would you?”

I know there are separate debates about whether this kind of reasoning-from-simplicity is actually good enough, when used by ordinary people, to consistently arrive at truth. Or whether it’s a productive way to conduct science that will give us good new theories, or a waste of everybody’s time. I sympathize with some these concerns, though I am nowhere near scientifically educated enough to have an actual opinion on the questions at play.

But I think it’s important to argue that even before you describe the advantages and disadvantages of the complicated Bayesian math that lets you do this, something like this has to be done. The untestable is a fundamental part of science, impossible to remove. We can debate how to explain it. But denying it isn’t an option.