Effect heterogeneity and external validity in medicine
Some time in the next week I'll write up a post with a few full examples (including the one from Robins, Hernan and Wasserman), and explain in a bit more detail.

I look forward to reading it. To be honest: Knowing these authors, I'd be surprised if you have found an error that breaks their argument.

We are now discussing questions that are so far outside of my expertise that I do not have the ability to independently evaluate the arguments, so I am unlikely to contribute further to this particular subthread (i.e. to the discussion about whether there exists an obvious and superior Bayesian solution to the problem I am trying to solve).

Effect heterogeneity and external validity in medicine

I don't have a great reference for this.

A place to start might be Judea Pearl's essay "Why I'm only half-Bayesian" at . If you look at his Twitter account at @yudapearl, you will also see numerous tweets where he refers to Bayes Theorem as a "trivial identity" and where he talks about Bayesian statistics as "spraying priors on everything". See for example and his discussions with Frank Harrell.

Another good read may be Robins, Hernan and Wasserman's letter to the editor at Biometrics, . While that letter is not about graphical models, the propensity scores/marginal structural models are mathematically very closely related. The main argument in that letter (which was originally a blog post) has been discussed on Less Wrong before; I am trying to find the discussion, it may be this link

From my perspective, as someone who is not well trained in Bayesian methods and does not pretend to understand the issue well, I just observe that methodological work on causal models very rarely uses Bayesian statistics, that I myself do not see an obvious way to integrate it, and that most of the smart people working on causal inference appear to be skeptical of such attempts

Effect heterogeneity and external validity in medicine
Ok, I think that's the main issue here. As a criticism of Pearl and Bareinboim, I agree this is basically valid. That said, I'd still say that throwing out DAGs is a terrible way to handle the issue - Bayesian inference with DAGs is the right approach for this sort of problem.

I am not throwing out DAGs. I am just claiming that the particular aspect of reality that I think justifies extrapolation cannot be represented on a standard DAG. While I formalized my causal model for these aspects of reality without using graphs, I am confident that there exists a way to represent the same structural constraints in a DAG model. It is just that nobody has done it yet.

As for combining Bayesian inference and DAGs: This is one of those ideas that sounds great in principle, but where the details get very messy. I don't have a good enough understanding of Bayesian statistics to make the argument in full, but I do know that very smart people have tried to combine it with causal models and concluded that it doesn't work. Bayesianism therefore plays essentially no role in the causal modelling literature. If you believe you have an obvious solution to this, I recommend you write it up and submit to a journal, because you will get a very impactful publication out of it.

The equality of this parameter is not sufficient to make the prediction we want to make - the counterfactual is still underspecified. The survival ratio calculation will only be correct if a particular DAG and counterfactual apply, and will be incorrect otherwise.

In a country where nobody plays Russian roulette, you have valid data on the distribution of outcomes under the scenario where nobody plays Russian roulette (due to simple consistency). In combination with knowledge about the survival ratio, this is sufficient to make a prediction for the distribution of outcomes in a counterfactual where everybody plays Russian roulette.

Effect heterogeneity and external validity in medicine
Identifiability, sure. But latents still aren't a problem for either extrapolation or model testing, as long as we're using Bayesian inference. We don't need identifiability.

I am not using Bayesian inference, and neither are Pearl and Bareinboim. Their graphical framework ("selection diagrams") is very explicitly set up as model for reasoning about whether the causal effect in the target population is identified in terms of observed data from the study population and observed data from the target population. Such identification may succeed or fail depending on latent variables and depending on the causal structure of the selection diagram.

I am confident that Pearl and Bareinboim would not disagree with me about the preceding paragraph. The point of disagreement is whether there are realistic ways to substantially reduce the set of variables that must be measured, by using background knowledge about the causal structure that cannot be represented on selection diagrams.

The obvious causal model for the Russian roulette example is one with four nodes:
first node indicating whether roulette is played
second node, child of first, indicating whether roulette killed
third node, child of second, indicating whether some other cause killed (can only happen if the person survived roulette)
fourth node, death, child of second and third node
This makes sense physically, has a well-defined counterfactual for Norway, and produces the risk difference calculation from the post. What information is missing?

In my model of reality (and I am sure, in most other people's model of reality), the third node has a wide range of unobserved latent ancestors. If the goal is to make inferences about the effect of Russian roulette in Russia using data from Russia, your analytic objective will be to find a set of nodes that d-separate the first node from the fourth node. You do not need to condition on the latent causes of the third node to achieve this (because those latent variables are not also causes of the first node- they cannot be, because the first node was randomized). The identification formula for the effect in Russia is therefore invariant to whether the latent causes of the third node are represented on the graph or not, and you therefore do not have to show them. The DAG model then represents a huge equivalence class of causal models; you can be agnostic between causal models within this equivalence class because the inferences are invariant between them.

But if the goal is to make predictions about the effect in Norway using data from Russia, these latent variables suddenly become relevant. The goal is no longer to d-separate the fourth node from the first node, but to d-separate the fourth node from an indicator for whether a person lives in Russia or Norway. In the true data generating mechanism (i.e. in the reality that the model is trying to represent), there almost certainly are a substantial number of open paths between the indicator for whether a person lives in Norway or Russia and their risk of death. The only possible identification formula for the effect in Russia includes terms for distributions that are conditional on the latent variables. The effect in Norway is therefore not identified from the Russian data.

The underlying structure of reality is still a DAG, it's only our information about reality which will be non-DAG-shaped. DAGs show the causal structure

I agree that reality is generated by a structure that looks something like a directed acyclic graph. But that does not mean that all significant aspects of reality can be modeled using Pearl's specific operationalization of causal DAGs/selection diagrams.

Any attempt to extrapolate from Russia to Norway is going to depend on a background belief that some aspect of the data generating structure is equal between the countries. In the case of Russian roulette, I argue that the natural choice of mathematical object to hang our claims to structural equality on, is the parameter that takes the value 5/6 in both countries.

In DAG terms, you can think of the data generating mechanism for node 4 as responding to a property of the path 1->2->4. In particular, this path forces the quantities Pr(Fourth node =0 | do(First node=1)) and Pr(Fourth node =0 | do(First node=0)) to be related by a factor of 5/6 in both countries. Reality still has a DAG structure, but you won't find a way to encode the figure 5/6 in a causal model based only on selection diagrams. Without a way to encode a parameter that takes the value 5/6, you have to take a long detour where you collect a truckload of data and measure all the latent variables.

Effect heterogeneity and external validity in medicine
The key issue is that we're asking a counterfactual question. The question itself will be underdefined without the context of a causal model. The Russian roulette hypothetical is a good example: "Our goal is to find out what happens in Norway if everyone took up playing Russian roulette once a year". What does this actually mean? Are we asking what would happen if some mad dictator forced everyone to play Russian roulette? Or if some Russian roulette social media craze caught on? Or if people became suicidal en-masse and Russian roulette became popular accordingly? These are different counterfactuals, and the answer will be different depending on which of these we're talking about. We need the machinery of counterfactuals - and therefore the machinery of causal models - in order to define what we mean at all by "what happens in Norway if everyone took up playing Russian roulette once a year". That counterfactual only makes sense at all in the context of a causal model, and is underdefined otherwise.

I absolutely agree that this is a counterfactual question. I am using the machinery of counterfactuals and causal models, just a different causal model from the one you and Pearl prefer. In this case, I had in mind a situation that is roughly equivalent to a mad dictator forcing everyone to play Russian roulette, but the underspecified details are not all that important to the argument I am making.

I assume by "unmeasured causes" you mean latent variables - i.e. variables in the causal graph which happened to not be observed. A causal diagram framework can handle latent variables just fine; there is no fundamental reason why every variable needs to be measured. Latent variables are a pain computationally, but they pose no fundamental problem mathematically.

This is straight up wrong, and on this particular point the causal inference establishment is on my side, not yours. For example, if there are backdoor paths that cannot be closed without conditioning on a latent variable, then the causal effect is not identified and there is no amount of computation that can get around this.

Indeed, much of machine learning consists of causal models with latent variables.

Much of machine learning gets causality wrong.

Whether the treatment has an effect does not seem relevant here at all.

It is relevant because it allows me to construct a very simple scenario where we have very strong intuition that extrapolation should work; yet Pearl's selection diagram fails to make a prediction for the target population.

No. My intuition very strongly says that 100% of the relevant structural information/model can be directly captured by causal models, and that you're just not used to encoding these sorts of intuitions into causal models. Indeed, counterfactuals are needed even to define what we mean, as in the Russian roulette example. The individual counterfactual distributions really are the thing we care about, and everything else is relevant only insofar as it approximates those counterfactual distributions in some situations.

I agree that you can encode all structural information in causal models. I do not agree that all structural information can be encoded in DAGs, which are one particular type of causal model. There are several examples of background information about the causal structure, which are essential for identifiability and which cannot be encoded on standard DAGs. For example, monotonicity is necessary for instrumental variable identification.

I am arguing that there is a special type of background information that is crucial for generalizability, and which cannot be encoded in Pearl/Bareinboim's causal diagrams for transportability. I therefore proposed a non-DAG causal model which is able to use this background structural knowledge. The Russian roulette example is an attempt to illustrate the nature of this class of background knowledge.

This does not mean that it is impossible to make an extension of the causal DAG framework to encode the same information. I am just arguing that this is not what the Pearl/Bareinboim selection diagram framework does.

Overall, my impression is that you don't actually understand how to build causal models, and you are very confused about their applicability and limitations.

I did specifically invoke Crocker's Rules, so I'd like to thank you for this feedback.

Of course, I think you are wrong about this. I dislike appeals to authority, but I would like to point out that I have a doctoral degree in epidemiologic methodology from Harvard, and that my thesis advisors were genuine thought leaders in causal modelling. I also want to point out that both my papers on this topic have been reviewed by editors and peer-reviewers with a deep understanding of causal models.

This does of course not necessarily mean that you are wrong. It does however mean that I think you should adjust your priors and truly try to understand my argument before you reach such a strong posterior.

If you genuinely have found a flaw in my argument, I'd like you to state it explicitly rather than just claim that I don't understand causal models. In a hypothetical world in which I am wrong, I would very much like to know about it, as it would allow me to move on and work on something else.

Effect heterogeneity and external validity in medicine

I am curious why you think the approach based on causal diagrams is obviously correct. Would you be able to unpack this for me?

Does it not bother you that this approach fails to find a solution (i.e. won't make any predictions at all) if there are unmeasured causes of the outcome, even if treatment has no effect?

Does it not bother you that it fails to find a solution to the Russian roulette example, because the approach insists on treating "what happens if treated" and "what happens if untreated" as separate problems, and therefore fails to make use of information about how much the outcomes differs by between the two treatment options?

Does it not seem useful to have an alternative approach that is able make use of all the intuition that says we should be able to make such extrapolations? An alternative approach that formalizes the intuition that led all the pre-Pearl literature to consider the problem in terms of the magnitude of the effect, not in terms of the individual counterfactual distributions?

The New Riddle of Induction: Neutral and Relative Perspectives on Color

In my view, "the problem of induction" is just a bunch of philosophers obsessing over the fact that induction is not deduction, and that you therefore cannot predict the future with logical certainty. This is true, but not very interesting. We should instead spend our energy thinking about how to make better predictions, and how we can evaluate how much confidence to have in our predictions. I agree with you that the fields you mention have made immense progress on that.

I am not convinced that computer programs are immune to Goodmans point. AI agents have ontologies, and their predictions will depend on that ontology. Two agents with different ontologies but the same data can reach different conclusions, and unless they have access to their source code, it is not obvious that they will be able to figure out which one is right.

Consider two humans who are both writing computer functions. Both the "green" and the "grue" programmer will believe that their perspective is the neutral one, and therefore write a simple program that takes light wavelength as input and outputs a constant color predicate. The difference is that one of them will be surprised after time t, when suddenly the computer starts outputting different colors from their programmers experienced qualia. At that stage, we know which one of the programmers was wrong, but the point is that it might not be possible to predict this in advance.

The New Riddle of Induction: Neutral and Relative Perspectives on Color

I am not sure I fully understand this comment, or why you believe my argument is circular. It is possible that you are right, but I would very much appreciate a more thorough explanation.

In particular, I am not "concluding" that humans were produced by an evolutionary process; but rather using it as background knowledge. Moreover, this statement seems uncontroversial enough that I can bring it in as a premise without having to argue for it.

Since "humans were produced by an evolutionary process" is a premise and not a conclusion, I don't understand what you mean by circular reasoning.

Odds ratios and conditional risk ratios

Update: The editors of the Journal of Clinical Epidemiology have now rejected my second letter to the editor, and thus helped prove Eliezer's point about four layers of conversation.

Odds ratios and conditional risk ratios

Why do you think two senior biostats guys would disagree with you if it was obviously wrong? I have worked with enough academics to know that they are far far from infallible, but curious on your analysis of this question.

Good question. I think a lot of this is due to a cultural difference between those of us who have been trained in the modern counterfactual causal framework, and an old generation of methodologists who felt the old framework worked well enough for them and never bothered to learn about counterfactuals.

Load More