Frederik Hytting Jørgensen

LESSWRONG
LW

Frederik Hytting Jørgensen — LessWrong

Replying toBut Where do the Variables of my Causal Model come from?

But Where do the Variables of my Causal Model come from?

Super interesting post! Thanks for writing it.

I especially like the point that you raise here:

What does it mean, in practice, to perform an intervention over variables representing abstractions over reality? When does such a notion even make sense?

I, Luigi Gresele, and Sebastian Weichwald (co-first author of Rubenstein et al.) have a pre-print that goes deep into this question, although we certainly do not answer it. I think this problem is one of the main reasons that the Pearlian framework is probably not gonna be a good mathematical framework for agency.

I don't think the issue is unique to the Pearlian paradigm. You have the same problems whenever you talk about counterfactual statements like... (read more)

Replying toHow Does A Blind Model See The Earth?

Frederik Hytting Jørgensen6mo

How Does A Blind Model See The Earth?

Fun project.

I think these kinds of pictures 'underestimate' models' geographical knowledge. Just imagine having a human perform this task. The human may have very detailed geographical knowledge, may even be able to draw a map of the world from memory. This does not imply that they would be able to answer questions about latitude and longitude.

Replying toSmall foundational puzzle for causal theories of mechanistic interpretability

Frederik Hytting Jørgensen7mo

Small foundational puzzle for causal theories of mechanistic interpretability

Am I right that the line of argument here is not about the generalization properties, but a claim about the quality of explanation, even on the restricted distribution?

Yes, I think that is a good way to put it. But faithful mechanistic explanations are closely related to generalization.

Like here, your causal model should have the explicit condition "x_1=x_2".

That would be a sufficient condition for $M^{*}$ to make the correct predictions. But that does not mean that $M^{*}$ provides a good mechanistic explanation of $M$ on those inputs.

Replying toSmall foundational puzzle for causal theories of mechanistic interpretability

Frederik Hytting Jørgensen7mo*

Small foundational puzzle for causal theories of mechanistic interpretability

I'm a bit unsure about the way you formalize things, but I think I agree with your point. It is a helpful point. I'll try to state a similar (same?) point.

Assume that all variables have the natural numbers as their domain. Assume WLOG that all models only have one input and one output node. Assume that $M^{*}$ is an abstraction of $M$ on relative to input support $I = [n]$ and $τ$ . Now there exists a model $M^{+}$ such that $M (j) = M^{+} (j)$ for all $j \in I$ , but $M^{*}$ is not a valid abstraction of $M^{+}$ relative to input support $I^{+} = [n + 1]$ . For example, you may define the structural assignment of the output node in $M^{+}$ by

F_{output}^{M^{+}} (X^{+}) := {\begin{matrix} x & X_{input}^{+} \geq n + 1 F_{output}^{M} (X^{+}) & X_{input}^{+} \in [n] \end{matrix},

where $x$ is an element in $N ∖ τ_{output}^{- 1} (M^{*} (τ_{input} (n + 1)) [output])$ , which we assume to be non-empty.

There is nothing surprising about this. As you say, we need... (read more)

Small foundational puzzle for causal theories of mechanistic interpretability

Frederik Hytting Jørgensen

8mo

In this post I want to highlight a small puzzle for causal theories of mechanistic interpretability. It purports to show that causal abstractions do not generally correctly capture the mechanistic nature of models.

Consider the following causal model $M$ :

Assume for the sake of argument that we only consider two possible inputs: $(0, 0)$ and $(1, 1)$ , that is, $X_{1}$ and $X_{2}$ are always equal.^[1]

In this model, it is intuitively clear that $X_{1}$ is what causes the output $X_{5}$ , and $X_{2}$ is irrelevant. I will argue that this obvious asymmetry between $X_{1}$ and $X_{2}$ is not borne out by the causal theory of mechanistic interpretability.

Consider the following causal model $M^{*}$ :

Is $M^{*}$ a valid causal abstraction of the computation that goes on in $M$ ? That seems to depend on whether $Y_{1}$ corresponds to $X_{1}$ or to $X_{2}$ . If $Y_{1}$ corresponds to $X_{1}$ , then it seems that $M^{*}$ is... (read 476 more words →)

Replying toFinite Factored Sets in Pictures

Frederik Hytting Jørgensen3y*

Finite Factored Sets in Pictures

Finally got around to looking at this. I didn't read the paper carefully, so I may have missed something, but I could not find anything that makes me more at ease with this conclusion.

Ben has already shown that it is perfectly possible that Y causes X. If this is somehow less likely that X causes Y, this is exactly what needs to be made precise. If faithfulness is the assumption that makes this work, then we need to show that faithfulness is a reasonable assumption in this example. It seems that this work has not been done?

If we can find the precise and reasonable assumptions that exclude that Y causes X, that would be super interesting.

Replying toFinite Factored Sets in Pictures

Frederik Hytting Jørgensen3y

Finite Factored Sets in Pictures

For example, in theorem 3.2 in Causation, Prediction, and Search, we have a result that says that faithfulness holds with probability 1 if we have a linear model with coefficients drawn randomly from distributions with positive densities.

It is not clear to me why we should expect faithfulness to hold in a situation like this, where Z is constructed from other variables with a particular purpose in mind.

Consider the graph Y<-X->Z. If I set Y:=X and Z:=X, we have that X⊥Y|Z, violating faithfulness. How are you sure that you don't violate faithfulness by constructing Z?

Replying toFinite Factored Sets in Pictures

Frederik Hytting Jørgensen3y

Finite Factored Sets in Pictures

I'm not quite convinced by this response. Would it be possible to formalize "set of probability distributions in which Y causes X is a null set, i.e. it has measure zero."?

It is true that if the graph was (Y->X, X->Z, Y->Z), then we would violate faithfulness. There are results that show that under some assumptions, faithfulness is only violated with probability 0. But those assumptions do not seem to hold in this example.