Ω 17

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a longer, more mathematically dense introduction to "Finite Factored Sets." For a shorter introduction, see my Topos talk/transcript.

Abstract: We propose a new approach to temporal inference, inspired by the Pearlian causal inference paradigm—though quite different from Pearl's approach formally. Rather than using directed acyclic graphs, we make use of factored sets, which are sets expressed as Cartesian products. We show that finite factored sets are powerful tools for inferring temporal relations. We introduce an analog of d-separation for factored sets, conditional orthogonality, and we demonstrate that this notion is equivalent to conditional independence in all probability distributions on a finite factored set.

1. Introduction

1.1. Pearlian Causal Inference

Judea Pearl's theory of inferred causation (e.g., as presented in chapter 2 of Causality: Models, Reasoning, and Inference) was a deep advance in our understanding of the nature of time. The Pearlian paradigm allows us to infer causal relationships between variables using statistical data, and thereby infer temporal sequence—in defiance of the old adage that correlation does not imply causation.

In particular, given a collection of variables and a joint probability distribution over those variables, the Pearlian paradigm can often infer temporal relationships between the variables.

The joint probability distribution is usually what gets emphasized in discussions of Pearl's approach. Quite a bit of work is being done, however, by the assumption that we are handed "a collection of variables" to reason about. The Pearlian paradigm is not inferring temporal relationships from purely statistical data, but rather inferring temporal relationships from statistical data together with data about how to factorize the world into variables.[1]

A doctor who misdiagnoses their patient or misidentifies a symptom may base their subsequent reasoning on a wrong factorization of the situation into causally relevant variables. We would ideally like to build fewer assumptions like this into our model of inference, and instead allow the reasoner to figure such facts out, consider the merits of different factorizations into variables, etc.

Instead of beginning with a collection of variables and a joint probability distribution over those variables, one could imagine starting with just a finite sample space and a probability distribution on that sample space. In this way, we might hope to do temporal inference purely using statistical data, without relying on a priori knowledge of a canonical way of factoring the situation into variables.

How might one do temporal inference without an existing factorization? One way might be to just consider all possible variables that can be defined on the sample space. This gives us one variable for each partition of the set.

However, when one tries to apply Pearl's methods to this collection of variables, one quickly runs into a problem: many of the variables definable on a fixed set are deterministic functions of each other. The Pearlian paradigm, as presented in the early chapters of Causality, lacks tools for performing temporal inference on variables that are highly deterministically related.[2]

We will introduce a new approach to temporal inference instead—one which is heavily inspired by the Pearlian paradigm, but approaches the problem with a very different formal apparatus, and does not make use of graphical models.

1.2. Overview

We'll begin by introducing the concept of a finite factored set, in Section 2. This will be our analogue of the directed acyclic graphs in Pearl's framework.

In Section 3, we will introduce the concepts of time and orthogonality, which can be read off of a finite factored set. In Pearl's framework, "time" corresponds to directed paths between nodes, and "orthogonality" corresponds to nodes that have no common ancestor.

In Section 4, we will introduce conditional orthogonality, which is our analogue of d-separation. We show that conditional orthogonality satisfies (a modified version of) the compositional semigraphoid axioms. We then (in Section 5) prove the fundamental theorem of finite factored sets, which states that conditional orthogonality is equivalent to conditional independence in all probability distributions on the finite factored set.

In Section 6, we discuss how to do temporal inference using finite factored sets, and give two examples. Finally, in Section 7 we discuss applications and future work, with an emphasis on temporal and conceptual inference, generalizing finite factored sets to the infinite case, and applications to embedded agency.

And here, we take our leave of Pearl. We've highlighted this approach's relationship to the Pearlian paradigm in order to motivate finite factored sets and explain how we'll be using them in this sequence. Formally, however, our approach is quite unlike Pearl's, and the rest of the sequence will stand alone.

2. Factorizations

Before giving a definition of finite factored sets, we will recall the definition of a partition, and give some basic notation related to partitions.

We do this for two reasons. First, we will use partitions in the definition of a factored set; and second, we want to draw attention to a duality between the notion of a partition, and the notion of a factorization.

2.1. Partitions

We begin with a definition of disjoint union.

Definition 1 (disjoint union). Given a set  of sets, let  denote the set of all ordered pairs , where  and .[3]

Definition 2 (partition). A partition of a set  is a set  of nonempty subsets of  such that the function  given by  is a bijection.[4]

Let  denote the set of all partitions of . The elements of a partition are called parts.

An equivalent definition of partition is often given: a partition is a set  of nonempty subsets of  that are pairwise disjoint and union to . We choose the above definition because it will make the symmetry between partitions and factorizations more obvious.

Definition 3 (trivial partition). A partition  of a set  is called trivial if .

Definition 4. Given a partition  of a set , and an element , let  denote the unique  such that .

Definition 5. Given a partition  of a set , and elements , we say  if .

Proposition 1. Given a partition  of a set  is an equivalence relation on .

Proof. Trivial.

Definition 6 (finer and coarser). We say that a partition  of  is finer than another partition  of , if for all , if , then .

If  is finer than , we also say  is coarser than , and we write  and .

Definition 7 (discrete and indiscrete partitions). Given a set , let .

If  is empty, let , and if  is nonempty, let .

is called the discrete partition, and  is called the indiscrete partition.

Proposition 2. For any set  is a partial order on . Further, for all  and .

Proof. Trivial.

While both notations are sometimes used, it is more standard to draw the symbol in the opposite direction and have  when  is finer than . We choose to go against that standard because we want to think of partitions in part as the ability to distinguish between elements, and finer partitions correspond to greater ability to distinguish.[5]

Definition 8 (common refinement). Given a set  of partitions of a fixed set , let  denote the partition  satisfying  if and only if  for all . Given , we let .

2.2. Factorizations

We start with a definition of Cartesian product.

Definition 9 (Cartesian product). Given a set  of sets, let  denote the set of all functions  such that for all  is of the form , for some .

We can now give the definition of a factorization of a set.

Definition 10 (factorization). A factorization of a set  is a set  of nontrivial partitions of  such that the function , given by , is a bijection.

Let  denote the set of all factorizations of . The elements of a factorization are called factors.

In other words, a set of nontrivial partitions is a factorization of  if for each way of choosing one part from each factor, there exists a unique element of  in the intersection of those parts.

Notice the duality between the definitions of partition and factorization. We replace subsets with partitions, nonempty with nontrivial, and disjoint union with Cartesian product, and we reverse the direction of the function. We can think of a factorization of  as a way to view  as a product, in the same way that a partition was a way to view  as a disjoint union.

A factored set is just a set together with a factorization of that set.

Definition 11 (factored set). A factored set  is an ordered pair , such that  is a factorization of .

If  is a factored set, we let , and let .

An important fact about factored sets is that the factors are enough to distinguish distinct elements.

Proposition 3. Given a factored set , and elements , if  for all , then .

Proof. Let  be a finite factored set, and let  satisfy  for all .

Let  be given by , as in the definition of factorization. Then . Since  is bijective, this means

2.3. Chimera Functions

The following theorem can be viewed as an alternate characterization of factorization. We will use this alternate characterization to define chimera functions, which will be useful tools for manipulating elements of factored sets.

Theorem 1. Given a set , a set  of nontrivial partitions of  is a factorization of  if and only if for every function , there exists a unique  such that for all .

Proof. First, we let  be a factorization of , and let  be any function. We want to show that there exists a unique  such that for all . Let  be given by , as in the definition of factorization. Note that  is bijective, and thus has an inverse.

Let . Observe that this is well-defined, because  is in fact in . We will show that  for all , and the uniqueness of this  will then follow directly from Proposition 3.

We have  by the definition of . However, we also have  by the definition of . Thus,  and  are the same function, so  for all , so  for all .

Conversely, let  be any set, and let  be any set of nontrivial partitions of . Assume that for all , there exists a unique  satisfying  for . Again, let  be given by , as in the definition of factorization. We want to show that  is invertible.

First, we show that  is injective. Take an arbitrary , and let  be the constant function satisfying  for all . Given another , if , then , so  for all , so  for all . Since there is a unique  satisfying  for all , this means . Thus  is injective.

To see that  is surjective, consider some arbitrary . We want to show that there exists an  with .

For all , let  be given by , which is well-defined since . Note that  is a nonempty subset of , so there exists a function  with  for all . Fix any such , and let  satisfy  for all

We thus have that for all , so . Thus  is surjective.

Since  is bijective, we have that  is a factorization of

This also gives us that factors are disjoint from each other.

Corollary 1. Given a factored set  and distinct factors .

Proof. Assume by way of contradiction that . Since  is nontrivial, there must be some other  with . Let  be any function such that  and . Then there can be no  such that  and , since then  would be in both  and . This contradicts Theorem 1.

We are now ready to define the chimera function of a factored set.

Definition 12 (chimera function). Given a factored set , the chimera function (of ) is the function  defined by  for all  and .

The name "chimera function" comes from the fact that  can be viewed as building an element of  by fusing together the properties of various different elements. Since we will often apply the chimera function to functions  that only take on two values, we will give notation for this special case.

Definition 13. Given a factored set , and a subset , let  be given by , where  is given by  if , and  otherwise.

For , we will write  for .

The following is a list of properties of , which will be useful in later proofs. All of these properties follow directly from the definition of .

Proposition 4. Fix , a factored set, , and .

1.  for all .
2.  for all .
3. .
4. .
5. .
6. .
7. .
8. .
9. .
10. .
11. .

Proof. Trivial.

2.4. Trivial Factorizations

We now define a notion of a trivial factorization of a set, and show that every set has a unique trivial factorization.

Definition 14 (trivial factorization). A factorization  of a set  is called trivial if . A factored set  is called trivial if  is trivial.

Proposition 5. For every set , there exists a unique trivial factorization  of . If , this trivial factorization is given by , and if , it is given by .

Proof. We start with the case where . The only partition of  is , so we only need to consider the sets of partitions  and  as potential factorizations.  is vacuously a factorization of  by Theorem 1, since there are no functions from  to  is not a factorization by Theorem 1, since there is a function from  to , but there is no element of . Thus, when  is the unique trivial factorization of .

Next, consider the case where . First, observe that the unique  vacuously satisfies  for all  and , since there is no . Thus, by Theorem 1,  is a factorization of . Further,  is the only factorization of , since there are no nontrivial partitions of . Thus, when  is the unique trivial factorization of .

Next, we consider the case where . Observe that  is a nontrivial partition of . Let . We want to show that  is a factorization of . By Theorem 1, it suffices to show that for all , there exists a unique  with . We can take , which clearly satisfies . This  is unique, since if , then , so . Thus  is a factorization of .

On the other hand, if  is not a factorization of , since if it were, Proposition 3 would imply that all elements of  are equal. Further, for any partition  of , with , there must exist , with , but . Thus  cannot be a factorization of  by Proposition 3. Thus when  is the unique trivial factorization of

2.5. Finite Factored Sets

This sequence will primarily be about finite factored sets.

Definition 15. If  is a factored set, the size of , written , is the cardinality of . The dimension of , written , is the cardinality of  is called finite if its size is finite, and finite-dimensional if its dimension is finite.

We suspect that the theory of infinite factored sets is both interesting and important. However, it is outside of the scope of this sequence, which will require finiteness for many of its key results.

Some of the definitions and results in this sequence will be given for finite factored sets, in spite of the fact that they could easily be extended to finite-dimensional or arbitrary factored sets. This is because they can often be extended in more than one way, and determining which extension is most natural requires further developing the theory of arbitrary factored sets.

Proposition 6. Every finite factored set is also finite-dimensional.

Proof. If  is a factored set,  is a set of sets of subsets of . Thus,

This bound is horrible and will be improved in Proposition 9. First, however, we will take a look at the number of factorizations of a fixed finite set.

Proposition 7. Let  be a finite factored set. Then .

Proof. Trivial.

Proposition 8. If  is equal to , or a prime, the trivial factorization of  is the only factorization of .

Proof. If  or , then , so  can have cardinality at most 1.

If , a prime, then by Proposition 7,  must divide  for all . Since factorizations cannot contain trivial partitions, this means  for all . However,  is the only element of  of cardinality , so

On the other hand, in the case where  is finite and composite, the number of factorizations of  grows very quickly, as seen in Table 1. Table 1 shows the number of factorizations of a set  with cardinality up to 25:

Given the naturalness of the notion of factorization, we were surprised to discover that this sequence did not exist on the On-Line Encyclopedia of Integer Sequences (OEIS). We added the sequence, A338681, on April 30, 2021.

To give one concrete example, the four factorizations of  the set  are:

• ,
• ,
• , and
• .

Proposition 9. Let  be a finite factored set.

• If , then .
• If , then .
• If  is prime, then .
• If  is a product of  primes, then .

Proof. The first three parts follow directly from Proposition 5 and Proposition 8. For the fourth part, let , and let  be a product of  primes.

By Proposition 7, . Consider an arbitrary . Since  is a nontrivial partition of a finite set  is finite and . If  were , then  would be . Thus  is a natural number greater than or equal to  cannot be empty, since . If  were greater than , then we would be able to express  as a product of more than  natural numbers greater than or equal to , which is clearly not possible since  is a product of  primes. Thus

In the next post, we will introduce the notions of the history of a partition, orthogonality between partitions, and time.

Acknowledgments: My thanks to Alex Appel, Ramana Kumar, Xiaoyu He, Tsvi Benson-Tilsen, Andrew Critch, Sam Eisenstat, Rob Bensinger, and Claire Wang for discussion and feedback on this sequence.

Footnotes

[1] Although I say "factorize" here, note that this will not be the kind of factorization that shows up in finite factored sets, because (as we will see) disjoint factors must be independent in a finite factored set. I appeal to the same concept in both contexts because factorization is just a very general and useful concept, rather than to indicate a direct connection.

[2] At least, it lacks such causal inference tools unless we assume access to interventional data.

[3] Note that this definition and Definition 9 could have been made more general by taking  to be a multiset.

[4]  denotes the power set of .

[5] In our view, "" is also a more natural way to visually represent a mapping between a three-part partition  that is finer than a two-part partition .

Ω 17

New Comment
2 comments, sorted by Click to highlight new comments since:

The part about Chimera functions was surprising, and I look forward to seeing where that will go, and to more of this in general.

In section 2.1 , Proposition 2 should presumably say that  is a partial order on  rather than on  .

Fixed, Thanks.