LESSWRONGLW

Hi, gwern it's awesome you are grappling with these issues. Here are some jambling responses.

You might enjoy Sander Greenland's essay here:

http://bayes.cs.ucla.edu/TRIBUTE/festschrift-complete.pdf

Sander can be pretty bleak!

But does the number of causal relationships go up just as fast? I don't think so (although at the moment I can't prove it).

I am not sure exactly what you mean, but I can think of a formalization where this is not hard to show. We say A "structurally causes" B in a DAG G if and only if there is a directed path from A to B in G. We say A is "structurally dependent" with B in a DAG G if and only if there is a marginal d-connecting path from A to B in G.

A marginal d-connecting path between two nodes is a path with no consecutive edges of the form -> <- * (that is, no colliders on the path). In other words all directed paths are marginal d-connecting but the opposite isn't true.

The justification for this definition is that if A "structurally causes" B in a DAG G, then if we were to intervene on A, we would observe B change (but not vice versa) in "most" distributions that arise from causal structures consistent with G. Similarly, if A and B are "structurally dependent" in a DAG G, then in "most" distributions consistent with G, A and B would be marginally dependent (e.g. what you probably mean when you say 'correlations are there').

I qualify with "most" because we cannot simultaneously represent dependences and independences by a graph, so we have to choose. People have chosen to represent independences. That is, if in a DAG G some arrow is missing, then in any distribution (causal structure) consistent with G, there is some sort of independence (missing effect). But if the arrow is not missing we cannot say anything. Maybe there is dependence, maybe there is independence. An arrow may be present in G, and there may still be independence in a distribution consistent with G. We call such distributions "unfaithful" to G. If we pick distributions consistent with G randomly, we are unlikely to hit on unfaithful ones (subset of all distributions consistent with G that is unfaithful to G has measure zero), but Nature does not pick randomly.. so unfaithful distributions are a worry. They may arise for systematic reasons (maybe equilibrium of a feedback process in bio?)

If you accept above definition, then clearly for a DAG with n vertices, the number of pairwise structural dependence relationships is an upper bound on the number of pairwise structural causal relationships. I am not aware of anyone having worked out the exact combinatorics here, but it's clear there are many many more paths for structural dependence than paths for structural causality.

But what you actually want is not a DAG with n vertices, but another type of graph with n vertices. The "Universe DAG" has a lot of vertices, but what we actually observe is a very small subset of these vertices, and we marginalize over the rest. The trouble is, if you start with a distribution that is consistent with a DAG, and you marginalize over some things, you may end up with a distribution that isn't well represented by a DAG. Or "DAG models aren't closed under marginalization."

That is, if our DAG is A -> B <- H -> C <- D, and we marginalize over H because we do not observe H, what we get is a distribution where no DAG can properly represent all conditional independences. We need another kind of graph.

In fact, people have come up with a mixed graph (containing -> arrows and <-> arrows) to represent margins of DAGs. Here -> means the same as in a causal DAG, but <-> means "there is some sort of common cause/confounder that we don't want to explicitly write down." Note: <-> is not a correlative arrow, it is still encoding something causal (the presence of a hidden common cause or causes). I am being loose here -- in fact it is the absence of arrows that means things, not the presence.

I do a lot of work on these kinds of graphs, because these are graphs are the sensible representation of data we typically get -- drawn from a marginal of a joint distribution consistent with a big unknown DAG.

But the combinatorics work out the same in these graphs -- the number of marginal d-connected paths is much bigger than the number of directed paths. This is probably the source of your intuition. Of course what often happens is you do have a (weak) causal link between A and B, but a much stronger non-causal link between A and B through an unobserved common parent. So the causal link is hard to find without "tricks."

The dependence that arises from a conditioned common effect (simplest case A -> [C] <- B) that people have brought up does arise in practice, usually if your samples aren't independent. Typical case: phone surveys are only administered to people with phones. Or case control studies for rare diseases need to gather one arm from people who are actually already sick (called "outcome dependent sampling.")

Sterner measures might be needed: could we draw causal nets with not just arrows showing influence but also another kind of arrow showing correlations?

Phil Dawid works with DAG models that are partially causal and partially statistical. But I think we should first be very very clear on exactly what a statistical DAG model is, and what a causal DAG model is, and how they are different. Then we could start combining without confusion!

If you have a prior over DAG/mixed graph structures because you are Bayesian, you can obviously have beliefs about a causal relationship between A and B vs a dependent relationship between A and B, and update your beliefs based on evidence, etc.. Bayesian reasoning about causality does involve saying at some point "I have an assumption that is letting me draw causal conclusions from a fact I observed about a joint distribution," which is not a trivial step (this is not unique to B of course -- anyone who wants to do causality from observational data has to deal with this).

what's the psychology of this?

Pearl has this hypothesis that a lot of probabilistic fallacies/paradoxes/biases are due to the fact that causal and not probabilistic relationships are what our brain natively thinks about. So e.g. Simpson's paradox is surprising because we intuitively think of a conditional distribution (where conditioning can change anything!) as a kind of "interventional distribution" (no Simpson's type reversal under interventions: http://ftp.cs.ucla.edu/pub/stat_ser/r414.pdf).

This hypothesis would claim that people who haven't looked into the math just interpret statements about conditional probabilities as about "interventional probabilities" (or whatever their intuitive analogue of a causal thing is).

You might enjoy Sander Greenland's essay here: http://bayes.cs.ucla.edu/TRIBUTE/festschrift-complete.pdf Sander can be pretty bleak!

I tried to read that, but I think I didn't understand too much of it or its connection to this topic. I'll save that whole festschrift for later, there were some interesting titles in the table of contents.

I am not sure exactly what you mean, but I can think of a formalization where this is not hard to show.

I agree I did sort of conflate causal networks and Bayesian networks in general... I didn't re... (read more)

3Anders_H6yGood comment - upvoted. Just a minor question: You probably did not intend to imply that this was an arbitrary choice, but it would still be interesting to hear your thoughts on it. It seems to me that the choice to represent independences by missing arrows was necessary. If they had instead chosen to represent dependences by present arrows, I don't see how the graphs would be useful for causal inference. If missing arrows represent independences and the backdoor criterion holds, this is interpreted as "for all distributions that are consistent with the model, there is no confounding". This is clearly very useful. If arrows represented dependences, it would instead be interpreted as "For at least one distribution that is consistent with the DAG model, there is no confounding". This is not useful to the investigator. Since unconfoundedness is an independence-relation, it is not clear to me how graphs that encode dependence-relations would be useful. Can you think of a graphical criterion for unconfoundedness in dependence graphs? Or would dependence graphs be useful for a different purpose?

5

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters: