The mathematics of a latent variable model expresses the probabilities, p(x), for observations x as marginal probabilities integrating over unobserved z. That is, p(x) = integral over z of p(x,z), where p(x,z) is typically written as p(z)p(x|z).

It's certainly correct that nothing in this formulation says anything about whether z captures the "causes" of x.

However, I think it sometimes is usefully seen that way. Your presentation would be clearer if you started with one or more examples of what you see as typical models, in which you argue that z isn't usefully seen as causing x.

I'd take a typical vision model to be one in which z represents the position, orientation, and velocity of some object, at some time, and x is the pixel values from a video camera at some location at that time. Here, it does seem quite useful to view z as the cause of x. In particular, the physical situation is such that z at a future time is predictable from z now (assuming no forces act on the object), but x at a future time is not predictable from x now (both because x may not provide complete knowledge of position and orientation, and because x doesn't include the velocity).

This is the opposite of what you seem to assume - that x now cause x in the future, but that this is not true for the "summary" z. But this seems to miss a crucial feature of all real applications - we don't observe the entire state of the world. One big reason to have an unobserved z is to better represent the most important features of the world, which are not entirely inferrable from x. Looking at x at several times may help infer z, and to the extend we can't, we can represent our uncertainty about z and use this to know how uncertain our predictions are. (In contrast, we are never uncertain about x - it's just that x isn't the whole world.)

Reply

[-]hrbigelow2y10

Hi Dr. Neal,

Wow, I studied your work in grad school! (And more recently your paper on Gaussian Processes). Quite an honor to get a comment from you. Just as an aside, I am not sure if my figure is visible, can you see it? I set it as the thumbnail, but I don't see it anywhere. In case it doesn't, it is here:

https://www.mlcrumbs.com/img/epistemology-of-representation.png

I think I need to change some labels, I realize now that I have been using 'x' ambiguously - sometimes as a model input, and sometimes to represent the bedrock physical system. But, to clarify, I'll use your vision example, but add temporality:

: position, orientation and velocity of some object at time t
$x_{t}$ : pixel values from a video camera
$ϕ_{t}$ : physical state (particles, forces) of the relevant slice of space at time t (includes the object and photons emanating from it which have hit the camera)

Your presentation would be clearer if you started with one or more examples of what you see as typical models, in which you argue that z isn't usefully seen as causing x.

Actually your example of a typical vision model is one example where I'd argue this, though I fear you might think this is a trivial splitting of hairs.

In any case, I'll first assume you agree that "causation" by any use of the term, must be temporal - causes must come before effects. So, to a modified question: why wouldn't $z_{t - δ}$ be usefully seen as causing $x_{t}$ ? In this case, let's assume then that delta is the time required for photons to travel from the object's surface to the camera.

What I'm more saying is that $z_{t}$ , or $z_{t - δ}$ are platonic, non-physical quantities. It is $ϕ_{t - δ}$ which is causing $ϕ_{t}$ , and $x_{t}$ is just a slice of $ϕ_{t}$ . Or, if you like, $x_{t}$ could be seen as a platonic abstraction of it.

I would also add though that at best, $z_{t - δ}$ could at best be interpreted as an aspect of something that caused the pixels $x$ . Of course, just position, orientation and velocity of an object aren't enough to determine colors of all pixels.

This vision example is one example in which the $z_{t}$ representations are very much rigid-body summaries, and so it seems useful to strongly identify them as "causes". But, I am trying to put all of ML and mental models on the same semantic footing here. There are plenty of models, like for instance diffusion models, where the z are just pure noise, where such an interpretation makes no sense at all, even in the sense that you mean.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

6

z is not the cause of x

6

6

Introduction

Two separate realms

Causality

The notion of Truth

The Ultimate MNIST classifier

Active Inference

Causal reasoning probably requires temporal knowledge

Implication for morality

Implications for interpretability of representations

Implications for Machine Learning

Conclusion and Reflection