Q: Correlation often does imply Causation, but does not specify which kind?

by [anonymous]

5 min read20th Nov 20139 comments

2

Hello, introduction and embarassments

I'd like start with by apologizing for making this thread and include an explanation why. This post is very likely to be embarassing, it's about math and I can't do math. I'm a fairly smart person and also reasonably adept at handling abstract matters, rationality and so forth. But this is from an average Joe's perspective. On this topic, probability and mathematics in general, it's hardly a merit at all. Mathematics tends to involve people who all are at least fairly smart and reasonably adept at handling abstract matters. This shouldn't be a topic for me to touch, I know there's plenty of people who are more fit to do the job. Also this website seems to have really skilled thinkers, which of most are smarter, better educated and in many ways better suited, and in my opinion some of them could be perceived as exactly the type of people who do math and science. Sorry if I'm not one of you. I hope this post will not become an embarassment for myself, I also hope it will not cause a sudden spike in shared sense of shame for you guys. If that however does occur, I hope that this will be a learning experience for me and/or others who are not adept with mathematics - When you point out my errors, that is. :)

Correlation does not imply causation?

I often run into this phrase.. Correlation does not imply causation. Something about it bugs me, intuitively. Which for this audience is like blasphemy. Intuition that is. Last time I made some comments on the subject in a whim, it was very embarassing. I got my silly comments handed right back to me. But later I thought there was something to it and so I thought.. Well I'm gonna ask you guys, since I can't figure it out.

What correlation is about?

Correlation is a mathematical method of observing relationships between mathematical objects. Points, statistics which can be converted to points and so forth. There at least a couple of ways of calculating these relationships, the only probability related things I know of are the Spearman correlation and Pearson correlation. Then there's also bayes which is not really the same thing.

Sometimes it's blatantly obvious that a correlation can not be a coincidence. If it's blatanly obvious intuitively into such an extent that you at least think you can trust your intuition, in my book it means I don't know the method of clinically arriving at a rational explanation, rather than intuition being outright wrong.

Some observations

If there exists a statistical relationship between two sets of data, it can be explained by:

1. Coincidence.

2. Causal relationship

2.1 Causal relation ship, but which direction does it flow?

Causal relationship is a complicated topic, because for startes even if there is a relationship between two things, you can't quite tell which way the causality flows. Does eating apples cause condition X or does condition X cause people to eat apples?

2.2 Causal relationship through outside factors

But causality does not end there. There maybe an additional factor A which is connected to both - eating apples and condition X.

2.3 Causal relationship through distant outside factors.

In fact there may be any number of additional factors. These factors can also be distant in the sense of Markov's Blanket. Just to make an example you could have factors A, B and C, which of B and C cause condition X and eating apples, but A instead causes B. B and C despite being outside factors are in close proximity to X and eating apples, but A is a distant factor, because it causes X and eating apples in directly.

The chief point in this post is that this type of causality is largely different than coincidence. Coincidence does not really involve such close causal relationships.

I believe there should be a mathematical method for calculating an esimate of the probability that a correlation is coincidential. Which I think can be done with standard probability calculations combined with correct observations. To make a simple example you may have an ascending tendency in the values of some dataset and that can be matched through correlation to another dataset through the same ascending tendency. If these tendencies are really simple, then the correlation is more likely to be a coincidence. But if these tendencies are complex then even with a slightly weaker correlation, following a similar pattern, the likelyhood that it's not a coindicence increases a lot, intuitively speaking.

For an example we may have several series of objects which each contain 2 variables a and b. The variables can be on and off. If you examine the correlation between the on and off variables in the series, you might find that when variable a increases there's correlation to variable b - It also seems to increase when variable a increases examining the series of objects.

To make a practical example. Your grandmother brings you a basket filled with candy. The candies may or may not have spots on them (variable a) and they may also be spherical or cylindrical (variable b). Your grandmother visits you once a week. By the end of the year you have collected several basketfuls of candy. You may find a correlation between the candies being cylindrical and having spots on them - for an example if there's a tendency for the ratio of candies with spots on them to rise and there's also an increase in cylindrical candies. This can be explained by coincidence as well as causation, and the probability for coincidence can be rather large. You could think of that in the sense of occam's razor and minimum message length. Even if there's lots of candies in the basket, the series may be very short. The length of the series may not be statistically significant. Even if the series was of stastically significant length, the rule which explains the behavior, maybe reasonably short. For an example even if you have 1000 baskets of candies, it may be that there's an average of 2% decrease in the number of candies without spots on them. And it also happens that spherical canies have an average of 2% decrease in number. If you generate a random tendency to increase or decrease from 0-20% and it's maintained, it can become coindicential that these two variables have the same tendency to increase or decrease.. Despite there being a long series of baskets, lots of candy and so forth.

Sometimes you can tell if Correlation does imply causality - but does not specify which kind of causality.

However if the candies that are cylindrical are more likely to have spots on them than the candies that are round, and there's a large number of candies in the basket. This can not be explained by mere coincidence. If there's 10000 candies on average in the baskets and there is a correlation between each variable within objects, the candies that is, then it's unlikely that it's a coindicence. It "has" to (is very likely to) involve causality, because you can't resort to the minimum rule generating probability or the length of the series probability described in the above example. This however is relative to the number of objects and strength of the correlation - correlation and the number of objects being completementary to each other in terms of probability.

Even if "coincidence" is involved, it this example requires an outside factor. Which also means causality. Therefore in this example correlation does imply causality, but you can't tell which kind causality.

So you can apply methods to try an estimate if there's a high/low probability for a coincidence. If coincidence is unlikely you should be able to tell that somekind of causal relationship is involved.

Can someone tell me where I went wrong?

(I've hunch: It involves the part where I neglected my ignorance and decided to post anyway)

Also please tell me if it's ok to - if I should - delete this. =)

New to LessWrong?

Getting Started

FAQ