Hello, introduction and embarassments

I'd like start with by apologizing for making this thread and include an explanation why. This post is very likely to be embarassing, it's about math and I can't do math. I'm a fairly smart person and also reasonably adept at handling abstract matters, rationality and so forth. But this is from an average Joe's perspective. On this topic, probability and mathematics in general, it's hardly a merit at all. Mathematics tends to involve people who all are at least fairly smart and reasonably adept at handling abstract matters. This shouldn't be a topic for me to touch, I know there's plenty of people who are more fit to do the job. Also this website seems to have really skilled thinkers, which of most are smarter, better educated and in many ways better suited, and in my opinion some of them could be perceived as exactly the type of people who do math and science. Sorry if I'm not one of you. I hope this post will not become an embarassment for myself, I also hope it will not cause a sudden spike in shared sense of shame for you guys. If that however does occur, I hope that this will be a learning experience for me and/or others who are not adept with mathematics - When you point out my errors, that is. :)

 

Correlation does not imply causation?

I often run into this phrase.. Correlation does not imply causation. Something about it bugs me, intuitively. Which for this audience is like blasphemy. Intuition that is. Last time I made some comments on the subject in a whim, it was very embarassing. I got my silly comments handed right back to me. But later I thought there was something to it and so I thought.. Well I'm gonna ask you guys, since I can't figure it out.

 

What correlation is about?

Correlation is a mathematical method of observing relationships between mathematical objects. Points, statistics which can be converted to points and so forth. There at least a couple of ways of calculating these relationships, the only probability related things I know of are the Spearman correlation and Pearson correlation. Then there's also bayes which is not really the same thing.

Sometimes it's blatantly obvious that a correlation can not be a coincidence. If it's blatanly obvious intuitively into such an extent that you at least think you can trust your intuition, in my book it means I don't know the method of clinically arriving at a rational explanation, rather than intuition being outright wrong.


Some observations

If there exists a statistical relationship between two sets of data, it can be explained by:

1. Coincidence.

2. Causal relationship

2.1 Causal relation ship, but which direction does it flow?

Causal relationship is a complicated topic, because for startes even if there is a relationship between two things, you can't quite tell which way the causality flows. Does eating apples cause condition X or does condition X cause people to eat apples?

2.2 Causal relationship through outside factors

But causality does not end there. There maybe an additional factor A which is connected to both - eating apples and condition X.

2.3 Causal relationship through distant outside factors.

In fact there may be any number of additional factors. These factors can also be distant in the sense of Markov's Blanket. Just to make an example you could have factors A, B and C, which of B and C cause condition X and eating apples, but A instead causes B. B and C despite being outside factors are in close proximity to X and eating apples, but A is a distant factor, because it causes X and eating apples in directly.


The chief point in this post is that this type of causality is largely different than coincidence. Coincidence does not really involve such close causal relationships.

I believe there should be a mathematical method for calculating an esimate of the probability that a correlation is coincidential. Which I think can be done with standard probability calculations combined with correct observations. To make a simple example you may have an ascending tendency in the values of some dataset and that can be matched through correlation to another dataset through the same ascending tendency. If these tendencies are really simple, then the correlation is more likely to be a coincidence. But if these tendencies are complex then even with a slightly weaker correlation, following a similar pattern, the likelyhood that it's not a coindicence increases a lot, intuitively speaking.

For an example we may have several series of objects which each contain 2 variables a and b. The variables can be on and off. If you examine the correlation between the on and off variables in the series, you might find that when variable a increases there's correlation to variable b - It also seems to increase when variable a increases examining the series of objects.

To make a practical example. Your grandmother brings you a basket filled with candy. The candies may or may not have spots on them (variable a) and they may also be spherical or cylindrical (variable b). Your grandmother visits you once a week. By the end of the year you have collected several basketfuls of candy. You may find a correlation between the candies being cylindrical and having spots on them - for an example if there's a tendency for the ratio of candies with spots on them to rise and there's also an increase in cylindrical candies. This can be explained by coincidence as well as causation, and the probability for coincidence can be rather large. You could think of that in the sense of occam's razor and minimum message length. Even if there's lots of candies in the basket, the series may be very short. The length of the series may not be statistically significant. Even if the series was of stastically significant length, the rule which explains the behavior, maybe reasonably short. For an example even if you have 1000 baskets of candies, it may be that there's an average of 2% decrease in the number of candies without spots on them. And it also happens that spherical canies have an average of 2% decrease in number. If you generate a random tendency to increase or decrease from 0-20% and it's maintained, it can become coindicential that these two variables have the same tendency to increase or decrease.. Despite there being a long series of baskets, lots of candy and so forth.

 

Sometimes you can tell if Correlation does imply causality - but does not specify which kind of causality.

However if the candies that are cylindrical are more likely to have spots on them than the candies that are round, and there's a large number of candies in the basket. This can not be explained by mere coincidence. If there's 10000 candies on average in the baskets and there is a correlation between each variable within objects, the candies that is, then it's unlikely that it's a coindicence. It "has" to (is very likely to) involve causality, because you can't resort to the minimum rule generating probability or the length of the series probability described in the above example. This however is relative to the number of objects and strength of the correlation - correlation and the number of objects being completementary to each other in terms of probability.

Even if "coincidence" is involved, it this example requires an outside factor. Which also means causality. Therefore in this example correlation does imply causality, but you can't tell which kind causality.

So you can apply methods to try an estimate if there's a high/low probability for a coincidence. If coincidence is unlikely you should be able to tell that somekind of causal relationship is involved.

 

 

 

Can someone tell me where I went wrong?

(I've hunch: It involves the part where I neglected my ignorance and decided to post anyway)

Also please tell me if it's ok to - if I should - delete this. =)

New to LessWrong?

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 12:39 AM

I often run into this phrase.. Correlation does not imply causation.

It is a shorthand for: "Just because A and B are correlated, it does not prove that A causes B." (Because that is the conclusion many people automatically do.)

Something about it bugs me, intuitively.

See this xkcd comic, especially the mouseover text for the image.

Which for this audience is like blasphemy. Intuition that is.

See "Your Strength as a Rationalist ", specifically the last paragraph.

I believe there should be a mathematical method for calculating an estimate of the probability that a correlation is coincidential.

There is a chance it could be somewhere in this book. I am not sure about it, because I didn't read the book, just heard about it.

By the way, I am not sure if your example also include observer-caused correlation (I don't know how it is officially called), where the correlation between the observed events is caused by the method how the observer selects them. For example, imagine that there are nine objects, let's call them A1, A2, A3, B1, B2, B3, C1, C2, C3. For some reason, you are only interested about properties A and 1. So you only notice the objects A1, A2, A3, B1, C1, and ignore all the rest. Now you observe that in you selected set, the properties A and 1 are anticorrelated. But this anticorrelation is not a property of the original set, only of your observed subset; so it explains something about you, not about the original data. (In real life known as: "Why are all the attractive [insert the sex you find attractive] so [insert a noticeable negative trait]?".)

A suggestion for future posts: If you notice something really really obvious, please assign higher probability that someone else noticed it too. It doesn't mean you shouldn't say it, but you can probably skip the long introduction before you get to the topic. ;-)

[-][anonymous]10y20

Thanks for your comment :)

I think I'm on the verge of deleting this post because :D I mean I did feel like well gee, this gotta be pretty obvious for someone who actually works with this stuff but... I just couldn't help it :D

The introduction surely was very long but.. I guess it worked to soften the landing.. err, FALL, a bit ... :D

[This comment is no longer endorsed by its author]Reply

You might want to check out the writings of Judea Pearl

I think part of the issue is that there is a further gap between correlation and even indirect causation, at least for the term causation as used in scientific literature. When the outside cause is external enough, we generally don't describe the correlation between two symptoms as a causative relationship. Strong coincidence implies that there is some cause, but it's useful to distinguish where one trait actually forces the other into existence, as opposed to where an external cause forces both traits to exist. Correlation requires only two traits: causation requires a direction.

For example, you could have a million candies that were cylindrical and had spots, and a million more candies that were spherical and had no spots, and absolutely no candies that were outside of those fields. That'd be incredibly strong correlative evidence, and we could pretty reliably reject coincidence (see odds ratios for the less-math-heavy version, or statistical hypothesis testing for the mathematician one). If you had to predict future events, it's be a great place to start out.

But even that level of confidence doesn't tell us that being cylindrical makes the candy have spots. It could be that your grandmother likes two particular brands of candy, one that is cylindrical with spots, and one that is spherical without spots. That's a causal relationship, but it's between your grandmother's candy preferences and the results, not between one candy trait and the next. ((In more complicated scenarios, you can have correlation with a cause of "your study methodology", which is technically casual but not very helpful.))

That's not an academic distinction. If you really disliked the spots on your candy and those spots were caused by the candy being cylindrical, you could ask your grandmother not to bring the cylindrical candies. If there's not a causative link between cylindrical candies and spotted candies, you might end up finding yourself with new spherical spotted candies (especially if the underlying cause is that your grandmother likes spotted candies!).

I think you had this part down, so my apologies if I'm repeating what you've said.

It's easiest to actually distinguish causation from correlation by understanding the full story of events, but there are some tools that let you determine it from data, but you have to be really sure of your data or you can get into some really ugly errors. ((There are also some entirely mathematical tests, like the Granger Causality test, but these are generally useful only in limited fields and can reach ridiculous and wrong conclusions outside of them.))

Yudkowsky brings up one example, after saying that, while correlation does not prove causation, it provides a strong wink-and-nudge eyebrow-wiggling sorta deal. You could look at the correlation between smoking and lung cancer, and then start digging. This isn't a perfect example, though : smoking strongly correlates with a wide array of schizoid and schizophrenic spectrum disorders, but as far as we can tell now that's not because smoking causes schizophrenia.

(I've hunch: It involves the part where I neglected my ignorance and decided to post anyway)

It is very rarely better to remain confused, and I doubt this is one of those circumstances.

[-]lmm10y30

An example I found interesting: in the early epidemiological research you see a substantial correlation between alcohol use and lung cancer. Turns out that's not a (direct) causal relationship, but just because smoking and drinking are correlated.

[-][anonymous]8y00

Highly suspect that time perspective predicts both schizphrenia and smoking

[-][anonymous]10y00

Thanks for your comment.

At first glance my first impulse is to comment this relationship between smoking and schizophreny...

...And just going with this post of mine, I think there's a causal relationship between schizophreny and smoking if this relationship is detected on an individual level. It could be that schizophrenia causes people to smoke for some reason, or through other factors, or that cigarete smoking causes schizophrenia, or there's a factor that causes both smoking and schizophrenia.. I think there has to be somekind of explanation, because coincidence seems too unlikely.

But my judgement may be affected because I have the following vague memory:

...As far as I know cigarette smoke hinders a Fibroblastic or neuroblastic growth factor from functioning, it happens to affect the growth of amygdala, and damage to amygdala is linked to schizophreny. Which would.. Well "wink" at smoking -> growth factor malfunction -> amygdala malfunction -> increased likelihood for obtaining schizophrenic symptoms. However all these facts would need to be checked. :D

[This comment is no longer endorsed by its author]Reply

You might find Eliezer's essay on timeless causality interesting (for context, it is worth reading the whole sequence leading up to this).