13

It is a commonplace that correlation does not imply causality, however eyebrow-wagglingly suggestive it may be of causal hypotheses. It is less commonly noted that causality does not imply correlation either. It is quite possible for two variables to have zero correlation, and yet for one of them to be completely determined by the other.

The causal analysis of statistical information is the subject of several major books, including Judea Pearl's Causality and Probabilistic reasoning in intelligent systems, and Spirtes et al's Causation, Prediction, and Search. One of the axioms used in the last-mentioned is the Faithfulness Axiom. See the book for the precise formulation; informally put it amounts to saying that if two variables are uncorrelated, then they are causally independent. As support for this, the book offers a theorem to the effect that while counterexamples are theoretically possible, they have measure zero in the space of causal systems, and anecdotal evidence that people find fault with causal explanations violating the axiom.
The counterexample consists of just two variables A and B. The time series data can be found here, a text file in which each line contains a pair of values for A and B. Here is a scatter-plot: The correlation is not significantly different from zero. Consider the possible causal relationships there might be between two variables, assuming there are no other variables involved. A causes B; B causes A; each causes the other; neither causes the other. Which of these describes the relationship between A and B for the above data?
The correct answer is that none of the four hypotheses can be rejected by these data alone. The actual relationship is: A causes B. Furthermore, there is no noise in the process. A is varying randomly, but B is deterministically caused by A and nothing else, and not by a complex process either. The process is robust: it is not by an accident that the correlation is zero. Every physical process that is modelled by the very simple mathematical relation at work here (to be revealed below) has the same property.
Because I know the process that generated these data, I can confidently predict that it is not possible for anyone to discover from them the true dynamical relation between A and B. So I'll make it a little easier to guess what is going on before I tell you a few paragraphs down.  Here (warning: large file) is another time series for the same variables, sampled at 1000 times the frequency (but only 1/10 the total time). Just by plotting these a certain regularity may become evident to the eyes, and it should be quite easy for anyone so inclined to discover the mathematical relationship between A and B.
So what are these variables, that are tightly causally connected yet completely uncorrelated?
Consider a signal generator. It generates a voltage that varies with time. Most signal generators can generate square waves or sine waves, sometimes sawtooths as well. This signal generator generates a random waveform. Not white noise -- it meanders slowly up and down without pattern, and in the long run the voltage is normally distributed.
Connect the output across a capacitor. The current through the capacitor is proportional to the rate of change of the voltage. Because the voltage is bounded and differentiable, the correlation with its first derivative is zero. That is what A and B are: a randomly wandering variable A and its rate of change B.
Theorem:  In the long run, a bounded, differentiable real function has zero correlation with its first derivative.
The proof is left as an exercise.
Notice that unlike the case that Spirtes considers, where the causal connections between two variables just happen to have multiple effects that exactly cancel, the lack of correlation between A and B is robust. It does not matter what smooth waveform the signal generator puts out, it will have zero correlation with the current that it is the sole cause of. I chose a random waveform because it allows any value and any rate of change of that value to exist simultaneously, rather than e.g. a sine wave, where each value implies at most two possible rates of change. But if your data formed neat sine waves you wouldn't be resorting to statistics. The problem here is that they form a cloud of the sort that people immediately start doing statistics on, but the statistics tells you nothing. I could have arranged that A and B had a modest positive correlation, by taking for B a linear combination of A and dA/dt, but the seductive exercise of drawing a regression line through the cloud would be meaningless.
In some analyses of causal networks (for example here, which tries, but I think unsuccessfully, to handle cyclic causal graphs), an assumption is made that the variables are at equilibrium, i.e. that observations are made at intervals long enough to ignore transient temporal effects. As can be seen by comparing the two time series for A and B, or by considering the actual relation between the variables, this procedure guarantees to hide, not reveal, the relationship between these variables.
If anyone tackled and solved the exercise of studying the detailed time series to discover the relationship before reading the answer, I doubt that you did it by any statistical method.
Some signal generators can be set to generate a current instead of a voltage. In that case the current through the capacitor would cause the voltage across it, reversing the mathematical relationship. So even detailed examination of the time series will not distinguish between the voltage causing the current and the current causing the voltage.

In a further article I will exhibit time series for three variables, A, B, and C, where the joint distribution is multivariate normal, the correlation of A with C is below -0.99, and each has zero correlation with B. Some causal information is also given: A is exogenous (i.e. is not causally influenced by either B or C), and there are no confounding variables (other variables correlating with more than one of A, B, or C). This means that there are four possible causal arrows you might draw between the variables: A to B, A to C, B to C, and C to B, giving 16 possible causal graphs. Which of these graphs are consistent with the distribution?