Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

A problem that's come up with my definitions of stratification.

Consider a very simple causal graph:

.

In this setting, and are both booleans, and with probability (independently about whether or ).

Suppose I now want to compute the counterfactual: suppose I assume that when . What would happen if instead?

The problem is that seems insufficient to solve this. Let's imagine the process that outputs as a probabilistic mix of functions, that takes the value of and outputs that of . There are four natural functions here:

Then one way of modelling the causal graph is as a mix . In that case, knowing that when implies that , so if , we know that .

But we could instead model the causal graph as . In that case, knowing that when implies that and . So if , with probability and with probability .

And we can design the node , physically, to be one or another of the two distributions over functions or anything in between (the general formula is for ). But it seems that the causal graph does not capture that.

Owain Evans has said that Pearl has papers covering these kinds of situations, but I haven't been able to find them. Does anyone know any publications on the subject?

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 9:36 PM

The problem is indeed that is insufficient to compute a unique counterfactual---additional causal information is needed. Pearl's approach is to specify each observable variable as a deterministic function of its parents in the causal graph. Any uncertainty must be represented by a set of "exogenous" variables , which can feature in the functions for the observables. (See chapter 7 of Causality, or also An Axiomatic Characterization of Causal Counterfactuals.)

For example, your first process could be represented by the following causal model:

The other processes might have different structures, equations, and distributions ---it's not possible in general to distinguish these purely from the distribution .

Thank you! That sentence is what I was looking for "Any uncertainty must be represented by a set of “exogenous” variables U".

I'd been doing that, but without any theoretical justification for it.