Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Seeing as there is little secondary literature for the Finite Factored Set formalism, I thought I’d write up my experience of exploring it through some toy examples that are classic examples in the Pearlian paradigm. My goal was to see how these models that I understood very well in the Pearlian paradigm would work in Finite Factored Sets.

As a warning, this doesn’t make any use of the more exciting properties of Finite Factored Sets. It’s just an exploration of how this formalism handles the mundane stuff. This also means that I’m using the factored set directly, without the abstractions of the orthogonality database. Which I think is fine here, because these are tiny toy examples whose structure is fully known. (However, it’s possible that I’ve missed the entire point of Finite Factored Sets.)


The first example is the 3-variable collider that is very central to Pearl’s formalism. It is given by the following Causal Diagram:

, , and are all binary variables (0=false, 1=true).

The intended meaning of a Causal Diagram (or rather, function causal model) is that the value of a node is given by a deterministic function that takes as input the parents, , (indicated by the arrows) and an “error”, or “noise”, variable that is governed by a probability distribution that is independent from the other error/noise variables: . Thus, the value of  is given by  where  is noise or uncertainty that is not explicitly modeled, which we can visualize like this:

We could also split up and into a deterministic and a random part, but as they are root nodes, there is little point. It would just be .

The Pearlian formalism runs on graphs, but Finite Factored Sets run on the set  of all possible outcomes – the sample space. So, the goal is now to construct a sample space that is consistent to the above graph. After that, we’ll find a factorization of that sample space.

I think it should be clear that to cover the whole sample space , it is sufficient to consider all possible combinations of the outcomes of , and  (but not ), because if we know the value of these three, then we also know the value of , via .

So, we simply define as the Cartesian product of the sets of possible values of  and : , , and the possible values of : , which I’ll leave undefined for the moment (except to note that it must be a finite set): (We use the lowercase letters to represent elements of sets that make up the Cartesian product: , , and .)

Then – as is custom in the formalism of Finite Factored Sets – the variables , , are defined as partitions on . For example, is a partition consisting of two parts: 1) the set of elements of where and 2) those where : This captures exactly what the variable is supposed to represent: the question of whether the first element of the Cartesian product is 0 or 1. is defined analogously: And as well: with an as of yet undefined number of parts in the partition.

Now, given the way we constructed , the set of partitions is a factorization of : , because any element of can be uniquely identified by knowing in which part of , , and it is, because is just the Cartesian product of , , and .

Now that we have the set, let’s look at the probability distribution over . Let be shorthand for , i.e., the probability that the outcome lands in the subset of the sample space , where the first element of the Cartesian product is equal to . We define and analogously. Finally, let refer to the probability of the individual outcome .

The fact that we want our finite factored set model to be consistent with the Causal Diagram above, implies some things about . In particular, the diagram implies that , , and should be independent, which means that the joint probability should factorize like this: But this is exactly how our (finite) set  factorizes! Thus, factorizes the same way that  does.

This concludes the translation of the graphical model into a semantically-equivalent finite factored set. To recap what we have accomplished: we turned the original model with the variables , , and into one with three independent variables: , , and . And defined as a deterministic function of these. We then constructed a finite factored set with the factors , , and .

Now, let’s define on as well. We again define it as a partition of : We simply partition depending on the output of the function . This also allows us to define as .


In the Pearlian formalism, we can read off the fact that and are independent from the graph structure. With Finite Factored sets, we have to look at histories. The history of a partition (aka a variable) is roughly speaking the minimal set of factors in the factorization that is sufficient to fully specify the partition (aka variable). and are factors in their own right, so their history consists just of themselves: and , because surely knowing is enough to know . As , we can conclude that and are orthogonal, but this is just because we defined the factorization that way, so this is no new information. Still, it’s a good consistency check.

The history of is more interesting, but still pretty trivial to determine. As long as makes use of all its arguments and is generally non-pathological, all the factors are needed to determine . So: . This implies and which implies and are both strictly before in the time defined by our factorization. This is a non-trivial result which matches what we would have expected from the graphical model that I drew in the beginning. So, that’s good.

But what if we condition on ? Colliders are special because they have but . Can we recover this results from our finite factored set model?

To compute conditional orthogonality, we have to compute conditional histories. In order to condition on , we need the histories conditioned on the subsets of where and where . We’ll call these subsets  and : Let’s start with the latter: let’s determine – the conditional history of given . However, the definition of conditional history is not as straightforward as the definition of history. I’m reproducing it here:

Let be a finite factored set, let , , , and let . We use to say that two elements and are in the same part in . The conditional history of given , written , is the smallest set of factors satisfying the following two conditions:

  1. For all , if for all , then .
  2. For all and , if for all and for all , then .

The first condition is easy. It just says that the factors in should be sufficient to pin down in . We can safely assume that itself will be enough. So, is just again? Well there is that second condition. To show its effect, let’s assume that indeed . The condition then specifies something that has to be true for all and . However, given the assumption we just made, I can construct a counterexample to the stated condition. (Thus showing by contradiction that is not just .)

To make things concrete, I’ll give a concrete definition of how is computed. First, let’s say that is the result of a 6-sided die; so: . We’ll then say that is XOR of and , except when rolls a 6, in which case is just 0: The counterexample is then: Let’s first confirm that and are indeed in . To be in , we need . This is true for both, because and and in neither case is .

Now, in the if-statement in the second condition, we first ask whether and can be distinguished by the factors in . We’re assuming for now that consists only of , so the question becomes whether and can be distinguished by looking at . The answer is no, because in both cases, the first entry is 1 (so, ). Thus, as far as the factors in are concerned and are the same.

Next we must investigate and in light of the the factors that are not in . In our case, that’s and . As we can see, and are indeed indistinguishable under and because and only differ in the first entry of the tuple (which corresponds to ). The if-statement is thus true, and so according to the condition, we should have . However, this is not the case, because , and so is in . In other words, we have a contradiction and can’t consist of only .

So, what does the second condition in the definition of conditional history look out for? It seems to want to prevent that both and its complement are incomplete when it comes to being able to answer the question of whether an element is in or not. That is, if both the factors within and the factors without seem to indicate that an element is in , then – the definition says – it should really be in . The factors should not be split in such a way that neither nor its complement ( where is the set of all factors) can reconstruct ; otherwise, the border itself leaks information about the likely values of factors.

The problem can be easily fixed by adding and to as well, so that has all the factors needed to fully pin down the border of : . Via symmetry, we also get and also the same for . We thus definitely don’t have anymore. And so, and are not orthogonal given .

This means we have recovered the two most important independence statements about the collider: and , as well as and (just from the fact that and are strictly before in time). What remains to be confirmed are and . I’ll leave that as an exercise for the reader.


As our next example, we’ll have a look at the 3-variable chain:

Separating out the noise/uncertainty:

I won’t repeat all the steps now and just skip to constructing the sample space as the Cartesian product of the possible values of , , and . The elements of are then tuples like this one: We define the partitions , , and on this in the obvious way, and so they are our factorization of . The variables and are defined as partitions of according to some deterministic function of and respectively. For the histories, it follows then that , , and , which implies , as one would expect.

(It might seem remarkable here that we can reconstruct the exact order of , and , but that is only because we defined  that way. Nothing interesting has happened yet. This is just a self-consistency check.)

I tried visualizing these histories but had limited success:

visualization of history

visualization of history

The interesting independence statement in this model is . So, to investigate this, let’s look at – the conditional history of on the subset of where . The first step is to clarify that is computed via : That is, and are only used via . But if we already know the output of (because we’re in the subset where ), then we only need to compute the value of . Thus, it would be consistent with condition 1 of the conditional histories definition if we had . But is it also consistent with condition 2?

In condition 2, we have first this part: “if ...” However, has no bearing on whether is in or not, because doesn’t take as an input. So, we can ignore that part. Without that part we are left with: if for all , then . Is this true for all and ? Yes, because what it is saying is, if seems to be in according to and , then it really is in . And that is true, because and already fully determine , so if they say you are in , then you are.

So, we avoided the trap we had above, where neither nor its complement could reconstruct the boundary of the set we were conditioning on. Here, and (which are both not in ) are able to reconstruct the boundary of perfectly, so condition 2 is fulfilled. Thus, (and analogously ), which implies that . (I didn’t check that doesn’t contain , but a quick argument shows that it won’t: is neither needed to pin down nor to pin down the border of .) So, as expected.


There is one interesting 3-variable model left – the common cause:

The reader may want to try this out for themself before reading on.

Splitting off the randomness, we get:

As before, we can construct a sample space that is factorized by , giving us the finite factored set . The histories for , , and are then again obvious: , , and . We can see that and are not independent, because their histories both include . But they also don’t have a definite temporal order; we can neither say nor .

From Pearl’s theory, we expect that and become independent when conditioned on . So let’s look at those conditional histories. As we know all the tricks by now, it will be a very high-level analysis.

We recall the first rule of conditional histories: the history should contain those factors that are needed to pin down the variable given that we already know the value of the conditioned variable. If we know the value of , then it suffices to know in order to know . So, the conditional history, , of given that we know the value of , contains at the least.

The second rule of conditional histories demands that either or its complement, , (or both) is on its own sufficient to determine the value of the conditioned variable ( in our case). Assuming , the complement of , , contains itself, and so is definitely sufficient to determine . Thus, when conditioning on values for , is a permitted history by the second rule.

By symmetry, we get for the conditional history for . This all then implies as expected.

Conclusions

I hope this was useful to someone. And I hope I didn’t completely mess up the intended use of this formalism.

One thing I appreciate about this formalism is that I find it easy to drop to the base level (the sample space with the factorization) to explicitly check my higher-level thoughts when I get confused. It’s nice to have that solid ground level available whenever I need it.

The rule about conditional histories is not exactly easy, but it feels closer to a fundamental law than the d-separation rules of the Pearlian paradigm, which always felt a bit arbitrary.

Finally, I still kind of think that a DAG is a really nice way of visualizing dependencies and independencies of random variables. I wonder if there is a visualization that feels more native to Finite Factored Sets while still looking nice.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 4:06 PM

This is excellent, thank you!