Suppose we have a strain of lab rats which are colored purple, and we want to know why. We suspect that chemical X is responsible, so we run an experiment:
- We genetically modify our purple rats to repress X production, and find that their purple coloration disappears.
- We genetically modify ordinary rats to produce X, and find that their coats turn purple.
We conclude that chemical X is both necessary and sufficient to turn rats’ coats purple. Case closed!
… or maybe not.
Suppose that rats are purple-colored if-and-only-if they express Purple Pigment (PP) above some threshold level. Purple Pigment, in turn, is chemically produced from X and Y:
High levels of PP could result from high levels of X, or from high levels of Y. Either way, increasing X enough will always turn a rat purple, and decreasing X enough will always turn a rat not-purple. So our experiment doesn’t tell us whether our particular rats are purple due to high X or high Y - it could be either. In order to tell the difference, we need to go measure X and Y levels in our rats - not an experiment, but an observation.
(Warning: technical details not relevant to the main point were brushed under the rug there.)
Generalizing: experiments are really good for figuring out the structure of the underlying causal graph. How can we tell that Purple Pigment is produced from X and Y in the first place? Experiment: we try various levels of X and Y and see which rats are purple.
But if we want to know the state of the causal graph, in some real-world system, then observation beats experiment. To find out whether our particular rats are purple because of high X or high Y, we should measure their X and Y levels, without any experimental intervention. Of course, this only works if we’ve already done the experiments to figure out the structure of the system.