The New Scientific Method

by nixtaken12 min read17th Jul 202028 comments


Epistemic ReviewWorld ModelingRationality
Personal Blog

(see : for this text with embedded videos.)


Before the coronacrisis started, I was at a party with some neighbors and we discussed some things…

Outsourcing of component manufacture…

Moving to an island to avoid a collapse of civilization…

Consoling our children over burnt koalas…

Eating probiotics to avoid brewing alcohol in our guts…

Avoiding cosmetics and cleaning products that will make us stupid…

In short, our discussions conveyed a sense that a darkness had fallen over the land, and this was even before the coronacrisis was underway.

The coronavirus was named for the corona-like shape it made when viewed in a microscope.

But I was trained to be a scientist and to believe that scientists will save the day. At least that is what happens in Hollywood movies.

Here I come to save the day!

Of course, I know that scientists will attempt to rigorously characterize our problems and propose some solutions with appropriately defined error bars.. which will be misinterpreted by the media and by the politicians making the decisions.

Rather, scientists will gather data and analyze it in accordance with the scientific method – but what method is that? It seems to have changed rather dramatically in recent years.

There is the old scientific method which was falsifiable, repeatable, frequentist, and it involved directly observable quantities. This method requires a researcher to:

  • Gather data
  • Plot data
  • Interpret data

There is now the new scientific method which involves models, paradigm shifts, bayesianism, and it may not require a control variable when measurements are constructed from multiple, indirect observations. This method requires a researcher to:

  • Filter the puzzle pieces
  • Put the puzzle together
  • Interpret the puzzle

But not all puzzle pieces are equally good.

When two, overlapping puzzle pieces agree, you may decide to keep them and when a puzzle piece disagrees with the other pieces, you may decide to throw it away.

When the puzzle pieces are very small, you may zoom in too much and accidentally measure the measurement device itself rather than the specimen.

When you use a feedback loop to converge on a setpoint, the effect of the feedback loop might be hidden from the error analysis. For example, if the light in the microscope both produces and samples the same effect, you can’t tell if the effect would’ve occurred under different conditions.

One must be especially careful when using newer methods. For example, Bayesian analysis of networks has become very popular in recent years and it frequently obscures and underestimates the impact of assumptions (priors).

In short, a modern researcher must be very careful not to

  • Pick and choose data
  • Over-focus
  • Hide errors with hidden feedback loops
  • Rely on unverified assumptions

Whenever these things are done, the scientific method is not being rigorously followed and you might accidentally trick yourself into believing in a result that is not true. Whenever there are common-mode sources of noise or systematic errors, you might accidentally filter out a picture of something that you didn’t intend.

A person who reads another group’s research must be aware of all of these issues and be on the lookout for sloppy or fraudulent activity. Some researchers who publish in the prestigious journal, Nature, are lazy enough to use a single photograph and slice and dice it so that it looks like it was an image of multiple, distinct samples.

This image is taken from a recent paper on the coronavirus was published in the top journal, Nature. The researchers tried to make it look like they had used multiple control samples when they had just copied and pasted from one. Data fabrication is rampant and an attempt to track the extent of the problem has been made through a website called Retraction Watch. One gets the sense that this is just a drop in the bucket.

These issues might seem like they should only concern scientists, but because our academic institutions train far more scientists than they can employ, all of these scientists go out into the broader community and sell their work to companies as ‘data science’. In my city, an ex-academic can take a crash course and rebrand himself as a data scientist within two weeks. But what sort of training have these people gotten. Is the academic system adding a lot of value to them? .. or is it making them worse by teaching them to delude themselves and cheat!?

This is a sketch from a satirical novel called Gargantua and Pantagruel that was supposedly written by a man named Rabelais in the 1500s. It was the story of a giant baby that a town fed and pampered in the hope that he would grow up and protect them. The end result was rather scatological. If a group managed to insert fake historical documents or literature into an academic cannon, the result would be rather subversive. I first saw this image in the form of an inside, academic joke about ‘big science’ or large, government funded projects.

The government subsidizes the production of data scientists through their research facilities at which young men and women learn how to make

  • Images of atoms
  • Images of proteins
  • CERN particle identification
  • Gravitational waves
  • Images of black holes

Each of these measurements requires a student to reconstruct an image or signal based on fragments buried in noise and they are often constructed in ways that are difficult to reproduce.

An image of an atom, for example, was constructed in 2013 by using a laser to eject individual electrons from a beam of hydrogen atoms. A toroid was used to amplify the angle of ejection so that the image would be large enough to see, but only some of the atoms had the right excitation and orientation.

How did they choose which atoms to use in image construction? How did they eliminate common mode errors from the toroid? What was their control variable? None of these questions are answered in the scientific publications, yet this image of an atom is the first thing you will see in a Google search and it doesn’t appear that any other research groups made a similar attempt, since the image only ever appears in a single color scheme.

I find it strange that this image was taken by a small research group in the Netherlands in 2013 and today, it is the only ‘picture of an atom’ that can be found through Google. I was taught that the scientific method requires that multiple experiments should be done by different groups and with different methods before one should believe a result, but this doesn’t appear to the be the case with this ‘image of an atom’, yet teachers everywhere will find it via Google and show it to their students.

I always believe the data, but I don’t always believe in the interpretations or conclusions and this is one of those cases. I just don’t see how they could’ve distinguished the effect of the toroid from the effect of the hydrogen atoms. When the hydrogen beam is off, they have a sort of control variable, but it isn’t really an ideal control variable. It would’ve been better if they’d had a different sort of particle that is similar to hydrogen. In that way, they could distinguish the scattering effects from different sorts of atoms. Otherwise, they are just comparing a scattered beam to a non-scattered beam and it is entirely possible that all of the modes they measured had nothing to do with hydrogen and they depended entirely on the toroid – their cross sections have the same shape, after all.

Even if they were really measuring modes of hydrogen and not of the toroid, the hydrogen atoms would’ve had all sorts of orientations and I wonder how they selected which data points to keep and which to throw away.

Questions of this sort also arise when x-ray crystallographers take images of proteins.

Since 2015, there has been a holography method to cross-check crystallography images, but it has only been applied to three proteins (as far as I know). Is there a house of cards of crystallographic data waiting to fall?

Synchrotron light may hit that crystallized protein many times from many different angles. One is left with a large data set of images that must be put together like a giant puzzle. How much do our expectations of how the puzzle should look influence the algorithm determining its construction?
If fast processing of the data is used to tune the machine used to complete the image, has the scientific method been compromised?

This question is surprisingly relevant for the interpretation of LHC data used to measure the famed Higgs particle.

The eye of Sauron? At a minimum, it is a data miner production facility.

I had the misfortune to listen to a talk by a young man who had done his Ph.D. work on designing a Bayesian neural net machine learning algorithm to improve the number of Higgs particles detected. He was boosting and attenuating the tens of thousands of signals in the detector to teach it to measure what the users wanted and he showed no awareness of how what he was doing might affect the error bars of the measurement or of the integrity of the scientific method.

His concern was about how the turnover within the thousands of young people employed by the LHC detector community was so high that making sure that knowledge was passed on required highly modular programming approaches. The system had evolved such that the knowledge of how the detector worked was contained within the system itself rather than within the people.

Something similar appears to have happened within the black hole measurement community. Since a black hole is the largest type of particle we have thought up, I’ll include it within this article on pictures of particles.

In April of last year, this Event Horizon Telescope (EHT) picture of a black hole was on the cover of newspapers and newsmagazines across the world.

The Event Horizon Telescope (EHT) community claimed to have taken a picture of a black hole by combining data from many different telescopes across the world. This is surprisingly similar to what x-ray crystallographers do when they construct an image of a protein by using a collection of images that all capture different aspects of the object under study, but what was interesting about this measurement was how, unlike images of proteins, it ended up on the front pages of newspapers and newsmagazines across the world, together with the face of the young project leader who was just a Ph.D. student while the data was being collected.

When I was in school, I never thought I would see a picture of a black hole because I was taught that the concept was an extrapolation from predictions of general relativity – a theory that fails to predict the shape of galaxies that are supposedly constructed around black holes. It the theory fails, it must be an approximation and extrapolations from approximations are usually stupid.

Black holes were popularized by Oppenheimer around the same time that the nuclear bomb was invented and the concept of a hole from which even light could not escape struck a giant resonance with the hopelessness and horror of the Zeitgeist, but it seems to me that whenever physics waxes poetic, it loses its connection to reality and confuses figurative and literal concepts. Mathematics is always an approximation when the map is not the territory.

Despite my reservations, I am always eager to analyze an experimental design, especially if it aims to measure something previously thought to be unmeasureable.

If 1.3 mm light is filtered out, simulations using general relativity say that there should be a hole in the middle of this galaxy.

The simulations of what they thought they would be able to see didn’t match exactly what they did see, but we all make mistakes.

On the left is a simulation of a black hole and on the right is what they simulated that their measurment device should be able to resolve.

Individual radio telescopes can’t resolve this hole, but if all of the measurements of all radio telescopes are combined, this new sort of measurement might see something interesting.

Many measurements of starlight’s timing and intensity must be combined in one image, even though individual measurements are a noisy blur, the gaps between measurement locations lower the fidelity of the image, and without the timing information, the intensity information is meaningless – a combined image is meaningless.

The next step in the modern scientific method is to secure the public’s approval in order to justify your government funding. This was never part of the original scientific method, but it plays an important role today. A good way to get the public’s approval is to have a young member of your collaboration give a TED talk.

If you can use popular movie images to promote your project, so much the better. Interstellar was a fun movie.

If your young project leader says things that are absurd, don’t worry. The public isn’t that clever and they aren’t listening very closely. In any case, it isn’t her fault, it just means that she has terrible teachers and no advisor.

[I can take random image fragments and assemble them like a puzzle to construct an image of a black hole.]”
TED @ 11:00
“[Some images are less likely than others and it is my job to design an algorithm that gives more weight to the images that are more likely.]”
TED @ 6:40

Once you have published your data and made a press release that gets your results onto the front pages of various magazines, you can now defend your result in front of an audience of your peers at CalTech.

  • 5:08 “this is equivalent to taking a picture of an orange on the moon.”
  • 14:40 “the challenge of dealing with data with 100% uncertainty.”
  • 16:00 “the CLEAN algorithm is guided a lot by the user.”
  • 19:30 “Most people use this method to do calibration before imaging, but we set it up to do calibration during imaging by multiplying the amplitudes and adding the phases to cancel out uncorrelated noise.”
  • 31:40 “A data set will equally predict an image with or without a hole if you lack phase information.”
  • 39:30, “The phase data is unusable and the amplitude data is barely usable.”
  • 36.20, “The machine learning algorithm tuned hundreds of thousands of ‘parameters’ to produce the image.”

Some scientists may take issue with these sorts of statements, but if the speaker is famous enough, they won’t dare say their criticisms out loud.

There are good reasons why any single one of the statements above would disqualify an experiment, but to avoid getting too tedious, I’ll try to illustrate the most glaring — aside from the statement that the phase (timing) data was unusable.

Most sensible researchers would agree that if the resolution of your experiment is equivalent to taking a picture of an orange on the moon, this means that you cannot do your experiment. It doesn’t mean that you just have to try harder and use your ‘can-do’ attitude to accomplish the impossible.

The picture in the top left was taken in 1973 and the picture bottom left was taken in 2013 with the Hubble telescope.
We’ve come a long way.

Hubble is our best telescope because it is not affected by the Earth’s atmosphere, but it still can’t take a picture of an orange on the moon because the moon moves across the sky faster than Hubble can track it. When you combine rotational and translational motion of a detector, that limits what it can measure. It is a sort of intrinsic uncertainty that the EHT people decided would not be a problem for them.

The next strange thing about the EHT results was how they characterized their measurments with a quantity called ‘uncertainty’. If I look up the meaning of the word, I see

standard deviation

uncertainty = 100 × -----------------------------------

average* √number of measurements

Then I think about a simple example.

How tall is a person in this group? About 75 cm, but the standard deviation is about 50 cm, so I am 20% uncertain, but if I measure the group ten more times, I’ll be only 6% uncertain. This is absurd, of course. Has the EHT collaboration confused themselves with this concept?

They certainly have strange notions about error propagation for correlated and uncorrelated errors.

They convinced themselves that they could use measurements with 100% uncertainty because the uncorrelated errors would would cancel out and they called this procedure: closure of systematic gain error. Once you give your solution a cool name, the problem is solved, right? They suggested that because this procedure was used in crystallography experiments, then they could use it.

But from what I could tell, this procedure was nothing more than multiplying together the amplitudes and adding the phase errors together and from what I learned about basic data analysis, you can only add together correlated errors that you want to remove from your data. You can’t do this with uncorrelated errors, yet this appears to be exactly what they did.

This is how you deal with uncorrelated errors. You can’t just add them together.

What do I mean by correlated errors? I mean things like the sun, something that is very close by and not 26,000 light years away like the black hole that they wanted to measure. Millimetre wavelengths are strongly absorbed by the atmosphere, but how can we be sure that the measurement wasn’t of sunlight leaking around the Earth? There is also a concern about the 1.3 mm emission of carbon monoxide. I’m sure that someone from their collaboration wrote a paper about how these two errors could be ruled out based on analytic approximations or numerical simulations, but all they would have to do to definitively rule this error out is to compare the measurements they took from the early morning and the late evening, yet they didn’t do this. Their data taking was spread out over sixteen hour shifts and they just mixed it all together.

It looks like a black hole in distant space, but it is actually a solar eclipse. If you overfocus, you might accidentally measure the measurement apparatus itself rather than the sample.

Even if they did not mistake a correlated error for what they wanted to measure, they made a basic mistake in their effort to construct an algorithm that is not biased by the user. They even constructed two different algorithms that were systematically biased by the user and decided that because they both produced the same result, they couldn’t be biased. Oops.

In an example of hand-tuned delusion, they used CLEAN + calibration algorithm. Both of these elements can be biased as the user selects for more ‘likely’ results. In an example of automated delusion, they used a Bayesian neural net machine learning algorithm that biased the ‘priors’ or assumptions. Machine learning should only be used to explore a parameter space to generate a hypothesis. It should not be used to confirm a hypothesis.

A diagnostic criteria for schizophrenia is seeing patterns where none exist and experiments like this one are designed to see patterns in noise.

This insanity is enabled by a news media that cannot distinguish good science from bad. They are instead distracted by identity politics as when men’s rights activists expressed jealousy that young Ms. Katie Bouman got so much credit for leading a project that was mostly conducted by men. The headline read: “Smart woman attacked by sexist brutes!”

The scientific community reacted by briefly glancing up from their highly specialized work to say, “Nicely done,” before returning to their favourite obsessions. They didn’t think that analysing the major result was in their purview. They preferred to trust the academic institutions responsible for reviewing the work.

Maybe my expectations are too high. Movie critics gave the EHT black hole image two thumbs up!

Gargantua is Kip Thornes’ black hole. It eats the light within the souls of young people who want to understand the world.

How did these wild goose chases get started? It turns out that belief in black holes is a very recent development.

  • In 1963, Kip Thorne understood that Neutron Stars and Black Holes were a result of invalid extrapolations from General Relativity. He wrote a paper about this.
  • In 1993, Kip Thorne wrote a book explaining how General Relativity is just one way to describe things and that the ability to speak in multiple, opaque languages carries power.
  • In 2017, Kip Thorne won a Nobel Prize for LIGO, a black hole experiment predicated on the validity of extrapolations from General Relativity.
I’ve written a lot about his flagship project, LIGO.

The one thing that all of the experiments I’ve described here have in common is their use of Bayesian statistics, something that has been used to construct machine learning algorithms and data science algorithms which are incomprehensible to those who work on them.

When we use these data science techniques to construct an image of a person, how well do these images match with how the person views herself and how others view her? If we do not acknowledge our blind spots, will the picture appear arbitrary, cartoonish, and without nuance? When we peer deep into the void, might we accidentally construct an image of ourselves?

Physicists fooling themselves with a complicated machine is a central theme in two novels I’ve written.

Novels should be part of a scientist’s training because they teach students how to distinguish between literal and figurative concepts.

If you believe that general relativity provides a literal depiction of nature rather than a figurative approximation, you will tend to believe in black holes.

Science should produce progress, but when it swirls around in eddies of self-citation, you end up with black holes – in a figurative, not literal sense.


Teaching kids to just ‘be’ is a valuable goal.
To be peaceful, to be loving, to be joyful…
Subsidizing too much ‘doing’ can be cruel.

Just think of how many weary nights went into the creation of all of these pictures of black holes, particles, proteins, and people.

Try if you would like to see this article in spoken-form. It is similar to the talk I gave in March at IdaLabs.