When I first read the Sequences, one of the exciting posts was Causal Diagrams and Causal Models, which got me into the idea that one could discover the structure of causal networks using statistics. Another rationalist source which gave me similar hopes was Scott Alexander's SSC Journal Club: Mental Disorders As Networks.

However, when I actually started applying these techniques to my own data, or to publicly available datasets, I often found that the techniques were unstable, and that one could easily infer plausible conditions where they would give the wrong results. It's possible I had the wrong approach or something, but in my confusion I started reading up on what experts in causal inference had said, and I got the impression that they studied the problem for a while, initially finding some algorithms, but over time concluding that their algorithms didn't work very well and that it is better to just have a human in the loop who specifies the causal networks.

So I mostly abandoned it, or saw it as a much more limited tool than I had before. But recently, John Wentworth argued that it was actually quite feasible in practice, so maybe I was too quick to abandon it. I would like to know - what are the best examples of this working well in practice? Or alternatively, did anyone else come to the same conclusions as I did?

New to LessWrong?

New Answer
New Comment
6 comments, sorted by Click to highlight new comments since: Today at 9:15 AM

I'm doing causal inference in academia. I do not work on causal discovery, neither does anyone of my colleagues, but I can tell my impression about it from occasional seminars on the topic.

The few causal discovery seminars I have seen belong to these categories:

  1. Purely theoretical work that does not show actual applications,
  2. Applied work that does not work well,
  3. Applied work that (maybe) worked but required years and a team.

Consider this a skewed but actual slice of the field.

My own thoughts on the subject matter:

In practice you don't have the information to completely reconstruct the causal relationships, neither to do it with low enough uncertainty that you can pretend you knew the graph, in cases where you have enough constraints to converge in principle to a single graph with infinite i.i.d. data. So an ideal method would provide you a list of graphs with a posterior probability for each, and then you would carry out the inference conditional on each graph. This is what Bayes tells you to do.

However, a graph with less arcs leads naturally to a lower-dimensional parameter space than one with more arcs, when you try to specify a model. This would suggest that the graph with missing arcs has probability zero. You can try to repair this with delta distributions (i.e., probability mass given to a single point in a continuous space), but does it make sense? As Andrew Gelman sometimes says, everything has a causal effect on everything, it's just that it can be very small. So maybe a model with shrinkage (i.e., keeping all connections in the graph, but defining a notion of "small" connection in the model and using a prior distribution that prefers simpler graphs) would make more sense.

I've not had these doubts answered in seminars nor by asking.

Finally, @IlyaShpitser may know something more.

This matches my impressions relatively well.

Just heard about this drug knowledge synthesis AI company called "Causaly", claiming "Captures causality as opposed to co-occurence, with 8 different relationship types.". Anything interesting going on here? https://www.causaly.com/technology/capabilities

Just out of curiosity, is there a problem where... causality is genuinely hard to assess without experimentation, so there are always going to be multiple credible hypotheses unless you wire it out to a lab and let it try stuff and gather focused evidence for distinguishing them?

I don’t know too much about this space, but Uber’s Causal ML python library & its uses may be a good place to look. That or Pyro, also made by Uber. Presumably Uber’s uses for these tools are success cases, but I don’t know the details. John has talked about Pyro being cool in previous posts of his, so he could have in mind the tools it provides when he talks about this.

Looking superficially, neither really seems to be doing causal structure discovery. Causal ML roughly speaking seems to be doing various form of multivariate regression, whereas Pyro seems to be fitting Bayesian networks. Both of these goals require assumptions from causal structure discovery, but they are not in themselves examples of causal structure discovery.