The Fundamental Theorem for measurable factor spaces

4Johannes C. Mayer

3Alexander Gietelink Oldenziel

New Comment

Consider adding more context at the beginning of the post about why the reader should care. Something like:

"Finite factored sets have the fundamental theorem, which is X. This is an analogous theorem but for the more general case of using measurable spaces. It is formulated using the causal graph construction."

This is important because it allows us to solve "Concrete problem P" which we were not able to solve before. (Or whatever the reason is that this is progress.) An alternative to this would be to first introduce a problem that is clearly important and then show how finite factored sets can't handle this problem very well.

Of course, if there are writeups doing these things you may supplement links to them. Though even then it is probably good to give a very brief summary still.

I think making these changes would be almost no work, compared to doing all the math, and it would make it a lot more accessible. Right now this text seems to assume that you are intimately familiar with finite factored sets, to the extent that you would not even look up again what the fundamental theorem for finite factored sets is (there is no link to it at the appropriate point).

As I don't quite understand what you did, some of the details in my suggestions might be wrong, but I think the overall points I am trying to make should still hold up.

Great to see this is being written up Matthias. Very happy to see slow but steady progress is being made in factored sets.

I present the fundamental theorem for all finitely factored measurable spaces. The fundamental theorem is that two events are orthogonal if and only if they are independent in all product probability distributions. It tells us that the definition of orthogonality really captures the essence of structural independence by the following arguments:

This theorem was previously proved in Finite Factored Sets for the finite case. The general case is interesting, since we can't use the finite structure. All the possible arguments are limited to the axioms of a measurable space. In particular, infinite things are sort of limits of finite things, so we can expect, through this result, that there should be nice approximation theorems for orthogonality. Something like, if I get more and more data about the world, then I can refine my view of which things are structurally independent.

To understand the technical result, it is necessary to understand the definition of the history in this setting. All the maths is in this document. I will try to describe a bit of the intuition used to derive the theorem.

The core idea is to express mathematically that the history tells us, when the conditional probability of an event depends on which factor.

I show that the history can be expressed mathematically exactly as this and show that this representation can be used to deduce that structural independence (independence for all factored distributions) implies orthogonality.

My definition of history still uses a probability distribution to define the disintegration of the index function, i.e. we need that πJ⫫πJc for all factorized probability distributions. It turns out that it suffices to show the condition for one such distribution.

Furthermore, in Lemma 9, we can write a more explicit form that conditional probabilities need to take, to satisfy this criterion. I am positive that this can be leveraged to deduce a criterion that does not reference probabilities at all.

It is noteworthy, that we don't even need to assume polish spaces, the arguments work for any measure space modulo nullsets.

The easiest way to extend this to infinitely factored spaces is to simply only allow features with finite history. This is sort of like an infinite directed graph, that has a start node, from which all nodes must be reachable. But it does not allow for continuous time. The main obstacle for features with infinite history is that we can't take the almost sure intersection of an arbitrary familiy of sets, because different product probability distributions are mostly not equivalent in the infinite case. Therefore, we can't really restrict ourselves to one set of nullsets.

I'm pretty sure that if we take the causal graph construction and extend it to causal graphs with measurable features, we get a result that d-separation is equivalent to being conditionally independent in all faithful probability distributions, and that the probability distributions that are unfaithful are 'small', which is, as far as I know, not known for the general case.