This post assumes knowledge of category theory, finite factored sets, and Bayes nets.

The Setup

I've already talked about DAGs and factor overlap Venn diagrams in a previous post, where I studied them within a category-theoretic framework. Here I'll also perform an explicit construction of them using set theory.

DAGs

I have already discussed the set of directed acyclic graphs over  elements. We will denote the set of all DAGs of  elements as . Each Bayes net can be thought of as a set of pairs of elements 

This set can be converted into the category of Bayes nets over  elements by the addition of morphisms corresponding to "bookkeeping"-type relationships, which we will denote. From this category, we can form a category whose elements are sets of the elements of , subject to the following condition on a set :

(I'll slightly go against standard notation by using calligraphic acronyms for my category names i.e. . I don't feel like any of my categories are definitely natural or useful enough to earn a "proper" bold name like )

This means that, if we can reach a given Bayes net  from any element  of our set , that Bayes net  must also be in . I refer to these as compatible sets and denote the category  (standing for compatible sets of Bayes nets) as the category whose elements are compatible sets of  elements and for any two elements. To write it out fully:

This should be read as "There is a unique morphism from  to  if and only if  is a weak superset of , otherwise there is no morphism". Orderings of this form always follow the rules required to create a category.

(Aside: sometimes we think about equivalence classes of Bayes nets. If we choose, we can first convert our Bayes nets to equivalence classes, then convert them to compatible sets, but this is not needed here)

This category has an initial object  which is a set consisting of the discrete Bayes net (with no arrows at all) and therefore all Bayes nets. Less trivially it has a terminal object  which contains exactly the Bayes nets which have an arrow between every pair of objects.

Venn diagrams

Presence/absence Venn diagrams of  elements can be thought of as elements of the power-set of the power-set of the set . We can write this as , but I will abbreviate this by writing  in future. As before, we can form a category by ordering the sets, but in this case we will reverse the order of the morphisms, to create the category  (standing for factor Venn diagrams).

This category has an initial object  and a terminal object 

Our categories look (something) like this:

The Payoff

Functors  and 

There exist functors  which are naturally-defined with respect to the properties of joint probability distributions. There exist subcategories  and  on which the functors  and  are totally inverse. The resulting category structure - I hope - will be useful for causal inference.

We will define compatibility  between a set of elements   and a Bayes net  as follows: there must exist some node  (the hook arrow  denotes the "inclusion" of the elements of  into , so it refers to the nodes in which are elements of ) such that all other nodes  are reachable via paths including only elements of 

 maps an object   is mapped to an object according to the following rule: a Bayes net  is present in  if and only if   holds for every set .

We will quickly verify that this respects morphisms. Any  such that  must contain at least all of the sets present in  (since ) and possibly more. Therefore, any  must also follow all of the constraints imposed by  and possibly more, so .

Next, we wish to define . This requires a little finesse but isn't too difficult. We shall say that  maps an object  to an object  according to the following rule: a set  is present in  if and only if  holds for every Bayes net net .

You may notice that this is almost the reverse of . This is by design. It's worth pausing for a moment to consider how they differ:  maps a factor overlap Venn diagram, constructed as a set of sets of items from , to a set of DAGs. We can think of it as starting with the complete set of DAGs, then going through our venn diagram , and for each  throwing out the DAGs which don't conform. 

Conversely  maps a set of DAGs to a set of factor overlaps. We can think of it as starting with the completely full factor overlap Venn diagram, and for each DAG , it throws out the factors which don't apply.

Quick Examples

Our functors are not quite inverses: for an example consider the Venn diagram corresponding to the set  in . This is mapped by  to exactly the set of totally connected DAGs, which is the terminal object in . The terminal object in  is mapped by  to the terminal object  in . This corresponds to what I called "agnosticism" in our set of Venn diagrams in the last post.

Note that all objects  such that  will contain the elements  and . We will often write this as a  at the start of our set to save time.

For an example in the opposite direction, consider the set in  which consists of the DAGs  and the minimal set of other DAGs needed for this to be a valid element of . This is mapped by  to the set , which is correspondingly mapped by  to the set of DAGs "downstream" of  and . I left sets like this out of our category in the previous post.

As an example where our functors are inverse to one another, consider the initial object of , which is the empty set . This is mapped by  to the initial object of , which is the set containing all elements of . This is compatible with no shared factors (as any shared factors impose constraints on Bayes nets) so is mapped by  right back to the empty set. This is true for any . It also applies to the terminal objects of any  and  i.e. the total power set  and the set containing only fully-connected Bayes nets.

Composing  and 

I am going to be lazy here and omit the subscript s, just imagine I've put them in while I'm talking about general properties of all  and . Consider the following sequence starting with an element :

We have that . But  is defined as the set of all elements  (in the relevant set  for our given ) such that . This means that all of our elements of  are also elements of , in other words , and by the definition of our functor .

But consider that we can reverse the logic: , but  is defined by . This means that , which alongside  gives us .

This means that for all . We can skip writing  and write this as , which is also the same as saying that  (the identity morphism) on the image of  under . Conversely  on the image of  under .

This also means that  so  is a "projection" operator. It "projects" the elements of  onto a subcategory of , but then doesn't do anything further to elements that are already there. Likewise for 

On our earlier diagram this all looks like this:

 maps elements of  to a elements of . We can label all of the elements that get "hit" as the image of , written . Likewise  maps elements of  to elements of , giving its image . More importantly, , with  on these subcategories.

Properties of the Resulting Subcategories

In category theory, we tend not to care about what the elements of a category are, only about the structure present in the morphisms. Therefore, rather than consider the rather clunky concept of two equivalent subcategories which are formed as the images of functors, we really just want to look at the resulting structure. Since I have no idea what to call it, I will call these categories  for now. Since it makes no sense to perform inference on zero variables, we'll impose the limit .

, which is the trivial category, with one element and one morphism.  is a two element category with a morphism from one of its elements to the other.  is a fifteen-element category with the structure discussed in the previous post.

 has 1218 elements. This is the highest I've been able to enumerate using inefficient python code and a mediocre PC. I could perhaps try to get something to work on , but no amount of efficiency can fight  for long. OEIS has nothing for the sequence 1,2,15,1218 so I'm stumped for better representations.

Combining Nodes in Bayes Nets

One bookkeeping rule we have not used yet is the node-combining rule. This lets us map an object of  to an object of . We do this by combining two numbers , replacing every  in an edge with , and then relabelling our numbers to be . We can also do the same for , mapping its elements to . This also works by combining two numbers as above.

For a given set  of possible "worlds", we might want to split it up into various partitions  (unlike in FFS, we do not impose any restrictions on these partitions). So if we have  then the partition  is a totally valid variable, as is . These "naturally" give a new variable  when combined.

So, given a copy of , we can "join" two variables together to map each element to an element of a copy of . We can do this in  different ways. Up to now, we haven't really cared about what each node of a Bayes net actually was, but if we want to combine nodes we'll need to think about this. 

For a given set,  and a set of set of partitions , we can define  as the category of Bayes nets over those partitions. We can then "glue together" all categories like this, to create  for a set . We can likewise associate factor overlap diagrams with these, and create  for each set of disjoint partitions. By gluing together the relevant functors  and  for each pair of categories, we can make an enormous category  corresponding to all sets of Bayes nets that a probability distribution over  could obey.

Three Further Thoughts

Thought 1

What are the properties of the sets in , i.e. the ones post-projection? Given a set of  factor overlaps (which includes ), can we determine in some reasonable time (polynomial in  and  perhaps?) whether it is changed by ? We have no good closed-form representation of the elements of  yet.

Thought 2

Bayes nets let us combine nodes (i.e. partitions) to get a new node and shrink the Bayes net, but is there some way to think about "blending" nodes continuously to move between Bayes nets with the same number of nodes?

So if we have our world , then we can split this into two nodes in a two-element Bayes net: . Or we could split it into two nodes like this: . Is there a way to embed these into a continuous space? Can we rotate between them? Can we describe probability distributions as elements of some space like  or something, then Bayes nets as regions in this space?

Thought 3

 is unfathomably large even for small sets: we can have up to  different partitions (and  grows exponentially fast), so we may have up to  different Bayes nets, with the largest being

New to LessWrong?

New Comment