(I once posted this question on academia.stackexchange, but it was deemed to be off topic there. I hope it would be more on-topic here)

I would like to introduce the basics of the scientific method to an audience unfamiliar with the real meaning of it, without making it hard to understand.

As the suspected knowledge level of the intended audience is of the type which commonly thinks that to "prove something scientifically" is the same as "use modern technological gadgets to measure something, afterwards interpret the results as we wish", my major topic would be the selection of an experimental method and the importance of falsifiability. Wikipedia lists the "all swans are white" as an example for a falsifiable statement, but it is not practical enough. To prove that all swans are white would require to observe all the swans in the world. I'm searching of a simple example which uses the scientific method to determine the workings of an unknown system, starting by forming a good hypothesis.

A good example I found is the 2-4-6 game, culminating in the very catchy phrase "if you are equally good at explaining any outcome, you have zero knowledge". This would be one of the best examples to illustrate the most important part of the scientific method which a lot of people imagine incorrectly, it has just one flaw: for best effect it has to be interactive. And if I make it interactive, it has some non-negligible chance to fail, especially if done with a broader audience.

Is there any simple, non-interactive example to illustrate the problem underlying the 2-4-6 game? (for example, if we had taken this naive method to formulate our hypothesis, we would have failed)

I know, the above example is mostly used in the topic of fallacies, like the confirmation bias, but nevertheless it seems to me as a good method in grasping the most important aspects of the scientific method.

I've seen several good posts about the importance of falsifiability, some of them in this very community, but I did not yet see any example which is simple enough so that people unfamiliar with how scientists work, can also understand it. A good working example would be one, where we want to study a familiar concept, but by forgetting to take falsifiability into account, we arrive to an obviously wrong (and preferably humorous) conclusion.

(How I imagine such an example to work? My favorite example in a different topic is the egg-laying dog. A dog enters the room where we placed ten sausages and ten eggs, and when it leaves the room, we observe that the percentage of eggs relative to the sausages increased, so we conclude that the dog must have produced eggs. It's easy to spot the mistake in this example, because the image of a dog laying eggs is absurd. However, let's replace the example of the dog with an effective medicine against heart diseases where someone noticed that the chance of dying of cancer in the next ten years increased for those patients who were treated with it, so they declared the medicine to be carcinogenic even though it wasn't (people are not immortal, so if they didn't die in one disease, they died later in another one). In this case, many people will accept that it's carcinogenic without any second thought. This is why the example of the egg-laying dog can be so useful in illustrating the problem. Now, the egg-laying dog is not a good example to raise awareness for the importance of falsifiability, I presented it as a good and useful style for an effective example any laymen can understand)



58 comments, sorted by Click to highlight new comments since: Today at 7:39 AM
New Comment

You have a bucket. You draw out a few red balls, and state that the "theory" is that the bucket contains only red balls. You emphasize that this is a falsifiable theory because it can be completely disproved by drawing out any other color of ball. For dramatic effect, you can proceed to draw out a blue ball.

If you have the opportunity to make it slightly more complicated, you can have another bucket, and state that your hypothesis about this bucket is that it contains 50% red and 50% blue balls. You start drawing out balls, and keep track of the count on a screen or whiteboard. The more balls you draw out, the more obvious it is that the ratio is something more like 90% red, 10% blue. This gives you the opportunity to talk about instances where the theory can't be disproven instantly by a single observation, but the burden of evidence nonetheless accumulates against it over time.

[-][anonymous]6y 1

I use something like this in introductory psychology lab classes.

For the initial example (theory = bucket contains only red balls) you can also introduce an unfalsifiable alternative theory, perhaps something like "bucket contains only red balls but the act of observing them may change their colour".

That's not unfalsifiable! Get them out by machine and photograph them before they get looked at. Then when you look at them and they're red, but the photographs are of blue balls, you've nailed it. Observation producing physical effects.

And carry on. As the theories have to squirm to avoid experiments, they'll slowly approach unfalsifiability and become more and more complex / make fewer and fewer predictions.

Except in quantum mechanics, where they end up being really simple and explaining everything, but they make your head break.

bonus: coloured buckets; balls with more than one colour, transparent balls, rubber ducks, empty bucket.

I think it would be great to start with a theory that sounds very scientific, but is unfalsifiable, and therefore useless. Then we modify the theory to include an element that is falisfiable, and the theory becomes much more useful.

For example, we have a new kind of medicine, and it is very good for some people, but when other people take the medicine it kills them. Naturally, we want to know who would be killed by the medicine, and who would be helped by it.

A scientist has a theory. He believes there is a gene that he calls the "Spottiswood gene". Anyone who has the proper form of the Spottiswood gene will be safe, they can take the medicine freely. But some people have a broken version of the Spottiswood gene, and they die when then they take the medicine. Unfortunately the scientist has no way of detecting the Spottiswood gene, so he can't tell you whether you have the gene or not.

Now this theory sounds very scientific and it's got lots of scientific words in it, but it isn't very useful. The scientist doesn't know how to detect the gene, so he can't tell you whether you are going to live or whether you are going to die. He can't tell you whether it is safe to take the medicine. If you take the pill and you survive, then the scientist will say that you had the working version of the gene. If you take the pill and you die, the scientist will say that you have the broken version of the gene. But he cannot say what will happen to you until after it has already happened, so his theory is useless. He can explain anything, but he can't make predictions in advance.

Now another scientist has a different theory. She thinks that the medicine is related to eye color. She thinks anyone with blue eyes will die if they take the medicine, and she thinks that anyone with brown eyes will be okay. She's not sure why this happens, but she plans to do more research and find out. Even if she doesn't do any more research, her theory is much more useful than than the first scientist's theory. If she's right, then blue-eyed people will know that they should avoid the medicine, and brown eyed people will know that they can take the medicine safely. She has made predictions. She predicts that no brown eyed person will die after taking the medicine, and she predicts that no blue eyed person will live.

Of course, the second scientist might be wrong. But the interesting thing is that if she's wrong, then we can prove that she's wrong. She predicted that no one with brown eyes will die after taking the medicine, so if lots of people with brown eyes die, then we will know that she's wrong.

If her theory is wrong, then we should be able to prove that it's wrong. And then if the results don't prove that she's wrong, we accept that she's probably right. That's called falsifiability.

But the first scientist doesn't have falsifiability. We know that even If he's wrong, we'll never be able to prove it - and that means we'll never know if he's wrong or right. More importantly, even he is right, his theory still wouldn't do anybody any good.

The first theory is falsifiable as long as you're willing to let enough people die. Collect blood samples from everyone before they take the medecine. Sequence their full exome and put it on file.

once you have a few thousand dead and a few thousand survivors you should be able to narrow candidates down to a few dozen genes.

Make predictions about who will die out of the next few hundred who take the pill, bam.

Turns out it's an eye color gene having some weird effect on a vital pathway that the drug is linked to.

Alternatively if it's not genetic at all, if single members of pairs of twins taking the drug died at rates inconsistent with the expected numbers of mutations between twins then we could be pretty sure it's not genetic.

or perhaps it's only partially genetic, again twins and siblings would let us work this out.

Seems pretty falsifiable.

Yes, that's definitely true. If you know a little, or a lot, about genetics, then the theory is falsifiable.

I think it still works just fine as an example though. The goal was to explain the meaning and the importance of falsifiability. Spotiswood's theory, as presented and as it was being used, wasn't making any useful predictions. No one was looking at familial comparisons, and i implied that Spotiswood wasn't making any effort to identify the gene, so the only observations that were coming in were "person lives", or "person dies". Within that context, Spotiswood's theory can explain any observation, and makes no useful predictions.

If that's not an example of an unfalsifiable theory, then it's still an example that helps explain the key elements of unfalsifiability, and helps explain why they're important.

If an audience member should then point out what you pointed out? Then that's brilliant. We can agree with the audience member, and talk about how this new consideration shows that the theory can be falsifiable after all.

But then we also get to point out how this falsifiability is what makes a theory much more useful... and the example still works because (QED) that's exactly the point we were trying to demonstrate.

Couldn't Spottiswood make a gene-detector by feeding the medicine in tiny tiny amount and seeing whether you just died a bit? Could be way useful.

Incidentally, i think that you're proposing a test for susceptibility to the medicine. The relevant theory here is that any person who would be killed by a full dose, would be also be harmed but not killed, by a much smaller dose. That's a perfectly testable, falsifiable theory, but i don't think it would directly test the claim that the cause is genetic.

A better test for genetic causes, is to look at family relationships. If we believe the cause is genetic, then we predict that people who are more closely related to each other, are more likely to have the same reaction to the medicine. And we predict that identical twins would always have the exact same reaction to the medicine.

The original poster was looking for a very easy example that children could follow, without needing to understand any maths or probability theory, so I wanted to keep it simple. That's why i didn't mention the idea of improving the original scientist's theory.


If the first scientist can come up with a way to test his theory, then it would probably make his theory more useful. It would also make it more falsifiable.

Personally, I'd start with the concept of predictive power, and then mention falsifiability as one way to notice if a statement makes no predictive assertions.

The point to make is that you're not claiming that non-falsifiable statements can't be true, it's just that they don't matter.

Quantum Immortality?

This depends on what kind of unfalsifiability you want. There are at least four kinds.

  • unfalsifiable with current resources (Russell's teapot)
  • unfalsifiable because of moving goalposts
  • unfalsifiable because the terms are incoherent or undefined ("not even wrong")
  • unfalsifiable in principle

No empirical claim is unfalsifiable in principle (i.e. without resource limitations, moving goalposts, or logical incoherency). Claims that involve violations of physical law come the closest, but require us to assume 100% confidence in the law itself. For a non-empirical claim to be unfalsifiable, empirical consequences of the claim have to be impossible, which ultimately requires you to eliminate them by definition. I think you’re trying to find an example of the fourth meaning when most people who talk about unfalsifiability are thinking about one of the others.

And if I make it interactive, it has some non-negligible chance to fail, especially if done with a broader audience.

You don't have to ask the whole audience. You can ask for a volunteer and ask people to raise their hands if they would be willing to volunteer. Then you pick a person who doesn't look like a nerd who already knows the problem.

Have a fallback plan: In case they do successfully answer the question you can ask them to explain why they ask the right questions.

Everyone I've tried the 2-4-6 test on thinks really carefully, gets the right answer, and claims never to have heard of it before. It's really irritating.

Occasionally I try it on someone who isn't one of my mathsy friends, and they give me that look where 'The maths-witch is trying to humiliate me again' and get angry and defensive, so I give up because it's not worth losing friends over.

Wikipedia lists the "all swans are white" as an example for a falsifiable statement, but it is not practical enough. To prove that all swans are white would require to observe all the swans in the world.

Something being falsifiable and something being universally possible to check are 2 different things.

In theory you could falsify that statement after checking only a single swan if it happens to be a black swan.

Conservation of energy is falsifiable. If you found some way of creating energy without taking it from elsewhere then you would falsify it. However it isn't practical to check every cubic meter of space in the universe to check if it applies everywhere.

there's also the old Invisible Dragon example from Sagan

“A fire-breathing dragon lives in my garage.”

Suppose … I seriously make such an assertion to you. Surely you’d want to check it out, see for yourself….

“Show me,” you say. I lead you to my garage. You look inside and see a ladder, empty paint cans, an old tricycle—but no dragon.

“Where’s the dragon?” you ask.

“Oh, she’s right here,” I reply, waving vaguely. “I neglected to mention that she’s an invisible dragon.”

You propose spreading flour on the floor of the garage to capture the dragon’s footprints.

“Good idea,” I say, “but this dragon floats in the air.”

Then you’ll use an infrared sensor to detect the invisible fire.

“Good idea, but the invisible fire is also heatless.”

You’ll spray-paint the dragon and make her visible.

“Good idea, except she’s an incorporeal dragon and the paint won’t stick.”

And so on. I counter every physical test you propose with a special explanation of why it won’t work.

Now, what’s the difference between an invisible, incorporeal, floating dragon who spits heatless fire and no dragon at all? If there’s no way to disprove my contention, no conceivable experiment that would count against it, what does it mean to say that my dragon exists? Your inability to invalidate my hypothesis is not at all the same thing as proving it is true. Claims that cannot be tested, assertions immune to disproof are veridically worthless, whatever value they may have in inspiring us or in exciting our sense of wonder. What I’m asking you do comes down to believing, in the absence of evidence, on my say-so.

What I don't get is: They never take swords, they never check for princesses. Don't they know that curiosity kills 95.234% of cats?

Conservation of energy is falsifiable.

And false. And I don't think conservation of mass-energy is thought to be globally true, it's a local property.

I find the last example mostly compelling for it's both interactive and impossible to get wrong. The OP could claim that he has been followed by an invisible dragon inside the classroom, and challenge the students to disprove its existence.

I like that though it would probably need to be somewhat interactive. Either that or you'd need a friend to do some of the call and response bits.

Probably most suitable if the audience includes lots of children.

You: "Hi everyone.

Today I'm going to be talking about some of the important concepts in science like falsifiability .... etc.

To help me I've brought John here and my Dragon."

John:"What dragon? I don't see any dragon."

You: "This dragon" [gesture at empty space] "I should probably have mentioned, he's invisible"

John: "... Ok, so he's invisible, lets see what his scales feel like" [wave hand through empty air]

And so on

You are out to get me. If you're mean to me, that's direct evidence. If you're nice, that's also evidence that you are trying to trick me into trusting you. If you ignore me, well that's the worst of all because it proves that you are trying to divert my attention from you so you can plan your attacks on me in secret.

That's not non-falsifiable. That's overly complex if you don't have other reasons to believe it. But it might well be true, and confirmation might be about to bring itself to your immediate attention.

Depending on the time scale of the experiment, you could run a rain-dance trial. (thinking primary school kids)

a rain dance is where you do a special dance to make it rain. spend 5 minutes each day doing the dance, (or not) and evaluating if it rained yesterday, keep a graph of rain+dancing. Run trials for as many weeks as you like.

That could backfire quite spectacularly :-)

We should keep running the trials until we can get p<0.05 and prove the hypothesis!

If this would be enough to prove the effectiveness of rain-dancing, then we would develop 30 different styles of rain-dance, test each of them, and with a very high chance we would get p<0.05 on at least one of them.

Sadly, the medical industry is full of such publications, because publishing new ideas is rewarded more than reproducing already published experiments.

We should keep running the trials until we can get p<0.05 and prove the hypothesis!

Hitting p<0.05 doesn't prove the hypotheis. That's not what the t-test does.

I came here to mention raindances. You do a raindance and nothing happens. You raindance for 12 more days and suddenly it rains. That must mean if you dance for 13 days straight (or dance until some other sort of requirement you Just So on the spot) it will rain!

If you don't add the idea of falsifiability to accept that raindances might not cause rain when you get negative results, then you will always get the conclusion that some amount of raindancing will cause rain.

Ideally you would add a parameter of audience interaction though if you really want everyone to feel the impact of their failed predictions on a gut level. That's the value of the 2-4-6 game and things like making predictions before learning about scope insensitivity.

raindance is good for that reason (it has a lot of freedom). You can do statistics on it; you can also (sneakily) keep experimenting for a very long time scale and only stop when you have the right answer.

you can also do things like - dance on days when the weather man says it will rain. just to confuse people

My motorcycle once broke down. Messing about with all the usual stuff didn't restart it. Eventually I danced backwards round it waving a spanner and singing about petrol. Started first kick.

Some other rules for the 2-4-6 game, so you can keep going if they get the first one:

  • The set has an even number in it

  • All three are 1-digit numbers

  • All three must be numbers

  • Whatever the numbers are, you alternate between replying 'yes' and 'no' (helps to be writing them down in this case)

Try to have more than one hypothesis under consideration at every time, and choose guesses which distinguish them.

I don't see how 2-4-6 is about falsifiability, so I may be misunderstanding your request. In the sequences, it was described as an example of positive bias. Clearly at every step, if the answer was "no" when you expected it to be "yes" the theory would be falsified.

I agree, but I see a connection to falsifiability in that most people don't even try to falsify their theories in this game, even if it would be possible.

A much better example than the 2-4-6 game would be one where the most obvious hypothesis was unfalsifiable.

Quantum immortality?

Continuity of Consciousness.

Are you the same person you were before you went to sleep last night? Were you created five minutes ago?

Russell's teapot is totally falsifiable. Modern telescopes are much better than they were in his day. Results are expected in soon.

I like the story of Feynman and Tukey counting, although it mixes up falsification with behaviorism. html pdf

Since then I found a partially relevant, but very simple and effective "puzzle".

There are four cards in front of you on the desk. It is known, that every card has a numerical digit on one side, and a letter from the English alphabet on the other side.

You have to verify the theory that "if one side of the card has a vowel, the other side has an even number", and you are only allowed to flip two cards.

The cards in front of you are:

A T 7 2

Which cards will you flip?

(I wrote partially relevant because this is not an example for an unfalsifiable theory. The theory is falsifiable and the puzzle is solvable, the main point is that most people would pick the wrong answer because they will not try to falsify the theory)

most people would pick the wrong answer because they will not try to falsify the theory

Actually, I think most people will misunderstand the theory they have to verify or falsify. However, evidently people's ability to solve this puzzle hugely depends on the way it's formulated.

Go through a Venn diagram explanation of Bayes's Theorem. Not necessarily the formula, but just a graphical representation of updating on evidence. Draw attention to the distribution of probability of H between E and not-E. Point out that if the probability of H doesn't go down upon the discovery of not E, it can't possibly go up upon the discovery of E.

This has the advantage of showing the requirement of falsifiability to be an extreme case of a more powerful general principle.

This could be supplemental to some of the great suggestions by your other commenters.

Most lay audiences can't simply generalize an abstract mathematical model to the real world. They need actual examples to learn in a way that impacts their day-to-day reasoning.

Take Howard Gardner theory of multiple intelligences. The world is fair. Some people have musical–rhythmic intelligence while other people have logical–mathematical intelligence.

Gardner theory has intuitive merit. But if you start to think about falsifiabilition you come to the question of whether the multiple intelligences really are different or whether they correlate positively with each other.

The world is fair.

Hmm. Then we shouldn't be able to find someone who was rubbish at maths and music at the same time. Or good at both. Easily falsifiable.

I don't really recommend talking to a bunch of children and deliberately spreading the message "some of you just suck at most things".

There are positive and valuable ways to teach the lesson that people aren't all equally "good at stuff", but it's a tough one to communicate well. It's not a good thing to bring up casually as an example when you're talking about something else.

Even if the intelligences correlate with each other, you'd need to know how strong the corelation is-- individual people could still be strikingly good or bad at some things while being mediocre, somewhat bad, or mildly talented at others.

Maybe the Placebo effect, all medications have affect the patient (even if it does nothing) so you can not prove a medicament does not work without a control group using Placebo to make the claim falsifiable.

I don't think the placebo effect has something directly to do with falsifiablty. As experiment that compares treatment A against the Gold standard treatment B makes a falsifiable claim in the absense of placebo control.

Does the flying spaghetti monster work?

Or the example in http://lesswrong.com/lw/ip/fake_explanations/?

This and Russel's teapot are just unverifiable claims, and not a study of understanding how a system works which would fail because we committed an innocent mistake.

Besides, they have strong ideological undertones, so all they would manage to do is to cater for the ego of those who agree with their ideological implications, and make angry those who don't. They won't really convince anyone.

You didn't mention what kind of audience it was. For some it would be an appropriate example.

What about the second example?

[-][anonymous]6y 0

I'm not sure I see anything wrong with your example. I'm not even sure what he is asking and what audience requires something where the flying spaghetti monster won't work. Maybe i've missed something big.

[This comment is no longer endorsed by its author]Reply

I think the second one works. It's unfalsifiable because it makes no predictions. If any heat-related thing can be explained by 'convection', then convection isn't saying anything.

Even nastier is Feynman's one about the baby-physics book which said 'What makes it move?' and the answer was always 'Energy makes it move'.