The Fundamental Theorem for measurable factor spaces — LessWrong

x

The Fundamental Theorem for measurable factor spaces — LessWrong

I present the fundamental theorem for all finitely factored measurable spaces. The fundamental theorem is that two events are orthogonal if and only if they are independent in all product probability distributions. It tells us that the definition of orthogonality really captures the essence of structural independence by the following arguments:

Whenever things are structurally independent, they should be probabilistically independent, regardless of the specific chosen distribution.
Orthogonality should be the strongest notion that entails the previous point.

This theorem was previously proved in Finite Factored Sets for the finite case. The general case is interesting, since we can't use the finite structure. All the possible arguments are limited to the axioms of a measurable space. In particular, infinite things are sort of limits of finite things, so we can expect, through this result, that there should be nice approximation theorems for orthogonality. Something like, if I get more and more data about the world, then I can refine my view of which things are structurally independent.

To understand the technical result, it is necessary to understand the definition of the history in this setting. All the maths is in this document. I will try to describe a bit of the intuition used to derive the theorem.

The core idea is to express mathematically that the history tells us, when the conditional probability of an event depends on which factor.

I show that the history can be expressed mathematically exactly as this and show that this representation can be used to deduce that structural independence (independence for all factored distributions) implies orthogonality.

My definition of history still uses a probability distribution to define the disintegration of the index function, i.e. we need that for all factorized probability distributions. It turns out that it suffices to show the condition for one such distribution.

Furthermore, in Lemma 9, we can write a more explicit form that conditional probabilities need to take, to satisfy this criterion. I am positive that this can be leveraged to deduce a criterion that does not reference probabilities at all.

It is noteworthy, that we don't even need to assume polish spaces, the arguments work for any measure space modulo nullsets.

The easiest way to extend this to infinitely factored spaces is to simply only allow features with finite history. This is sort of like an infinite directed graph, that has a start node, from which all nodes must be reachable. But it does not allow for continuous time. The main obstacle for features with infinite history is that we can't take the almost sure intersection of an arbitrary familiy of sets, because different product probability distributions are mostly not equivalent in the infinite case. Therefore, we can't really restrict ourselves to one set of nullsets.

I'm pretty sure that if we take the causal graph construction and extend it to causal graphs with measurable features, we get a result that d-separation is equivalent to being conditionally independent in all faithful probability distributions, and that the probability distributions that are unfaithful are 'small', which is, as far as I know, not known for the general case.