Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Finite Factored Sets: Introduction and Factorizations

3drocta

2Scott Garrabrant

This is a longer, more mathematically dense introduction to "Finite Factored Sets." For a shorter introduction, seemy Topos talk/transcript.Abstract:We propose a new approach to temporal inference, inspired by the Pearlian causal inference paradigm—though quite different from Pearl's approach formally. Rather than using directed acyclic graphs, we make use offactored sets, which are sets expressed as Cartesian products. We show that finite factored sets are powerful tools for inferring temporal relations. We introduce an analog ofd-separation for factored sets,conditional orthogonality, and we demonstrate that this notion is equivalent to conditional independence in all probability distributions on a finite factored set.## 1. Introduction

## 1.1. Pearlian Causal Inference

Judea Pearl's theory of inferred causation (e.g., as presented in chapter 2 of

Causality: Models, Reasoning, and Inference) was a deep advance in our understanding of the nature of time. The Pearlian paradigm allows us to infer causal relationships between variables using statistical data, and thereby infer temporal sequence—in defiance of the old adage that correlation does not imply causation.In particular, given

a collection of variablesanda joint probability distribution over those variables, the Pearlian paradigm can often infer temporal relationships between the variables.The joint probability distribution is usually what gets emphasized in discussions of Pearl's approach. Quite a bit of work is being done, however, by the assumption that we are handed "a collection of variables" to reason about. The Pearlian paradigm is not inferring temporal relationships from purely statistical data, but rather inferring temporal relationships from statistical data together with data about how to factorize the world into variables.[1]

A doctor who misdiagnoses their patient or misidentifies a symptom may base their subsequent reasoning on a wrong factorization of the situation into causally relevant variables. We would ideally like to build fewer assumptions like this into our model of inference, and instead allow the reasoner to figure such facts out, consider the merits of different factorizations into variables, etc.

Instead of beginning with a collection of variables and a joint probability distribution over those variables, one could imagine starting with just a finite sample space and a probability distribution on that sample space. In this way, we might hope to do temporal inference purely using statistical data, without relying on

a prioriknowledge of a canonical way of factoring the situation into variables.How might one do temporal inference without an existing factorization? One way might be to just consider all possible variables that can be defined on the sample space. This gives us one variable for each partition of the set.

However, when one tries to apply Pearl's methods to this collection of variables, one quickly runs into a problem: many of the variables definable on a fixed set are deterministic functions of each other. The Pearlian paradigm, as presented in the early chapters of

Causality, lacks tools for performing temporal inference on variables that are highly deterministically related.[2]We will introduce a new approach to temporal inference instead—one which is heavily inspired by the Pearlian paradigm, but approaches the problem with a very different formal apparatus, and does not make use of graphical models.

## 1.2. Overview

We'll begin by introducing the concept of a finite factored set, in Section 2. This will be our analogue of the directed acyclic graphs in Pearl's framework.

In Section 3, we will introduce the concepts of time and orthogonality, which can be read off of a finite factored set. In Pearl's framework, "time" corresponds to directed paths between nodes, and "orthogonality" corresponds to nodes that have no common ancestor.

In Section 4, we will introduce conditional orthogonality, which is our analogue of

d-separation. We show that conditional orthogonality satisfies (a modified version of) the compositional semigraphoid axioms. We then (in Section 5) prove the fundamental theorem of finite factored sets, which states that conditional orthogonality is equivalent to conditional independence in all probability distributions on the finite factored set.In Section 6, we discuss how to do temporal inference using finite factored sets, and give two examples. Finally, in Section 7 we discuss applications and future work, with an emphasis on temporal and conceptual inference, generalizing finite factored sets to the infinite case, and applications to embedded agency.

And here, we take our leave of Pearl. We've highlighted this approach's relationship to the Pearlian paradigm in order to motivate finite factored sets and explain how we'll be using them in this sequence. Formally, however, our approach is quite unlike Pearl's, and the rest of the sequence will stand alone.

## 2. Factorizations

Before giving a definition of finite factored sets, we will recall the definition of a partition, and give some basic notation related to partitions.

We do this for two reasons. First, we will use partitions in the definition of a factored set; and second, we want to draw attention to a duality between the notion of a partition, and the notion of a factorization.

## 2.1. Partitions

We begin with a definition of disjoint union.

Definition 1(disjoint union).Given a setSof sets, let⊔(S)denote the set of all ordered pairs(T,t), whereT∈Sandt∈T.[3]Definition 2(partition).A partition of a setSis a setX⊆P(S)of nonempty subsets ofSsuch that the functionι:⊔(X)→Sgiven byι(x,s)=sis a bijection.[4]LetPart(S)denote the set of all partitions ofS. The elements of a partition are called parts.An equivalent definition of partition is often given: a partition is a set A of nonempty subsets of S that are pairwise disjoint and union to S. We choose the above definition because it will make the symmetry between partitions and factorizations more obvious.

Definition 3(trivial partition).A partitionXof a setSis called trivial if|X|=1.Definition 4.Given a partitionXof a setS, and an elements∈S, let[s]Xdenote the uniquex∈Xsuch thats∈x.Definition 5.Given a partitionXof a setS, and elementss0,s1∈S, we says0∼Xs1if[s0]X=[s1]X.Proposition 1.Given a partitionXof a setS,∼Xis an equivalence relation onS.Proof.Trivial. □Definition 6(finer and coarser).We say that a partitionXofSis finer than another partitionYofS, if for alls0,s1∈S, ifs0∼Xs1, thens0∼Ys1.IfXis finer thanY, we also sayYis coarser thanX, and we writeX≥SYandY≤SX.Definition 7(discrete and indiscrete partitions).Given a setS, letDisS={{s}∣s∈S}.IfSis empty, letIndS={}, and ifSis nonempty, letIndS={S}.DisS

is called the discrete partition, andIndSis called the indiscrete partition.Proposition 2.For any setS,≥Sis a partial order onPart(S). Further, for allX∈Part(S),DisS≥SXandX≥SIndS.Proof.Trivial. □While both notations are sometimes used, it is more standard to draw the symbol in the opposite direction and have X≤Y when X is finer than Y. We choose to go against that standard because we want to think of partitions in part as the ability to distinguish between elements, and finer partitions correspond to greater ability to distinguish.[5]

Definition 8(common refinement).Given a setCof partitions of a fixed setS, let⋁S(C)denote the partitionX∈Part(S)satisfyings0∼Xs1if and only ifs0∼cs1for allc∈C. GivenX,Y∈Part(S), we letX∨SY=⋁S({X,Y}).## 2.2. Factorizations

We start with a definition of Cartesian product.

Definition 9(Cartesian product).Given a setSof sets, let⊓(S)denote the set of all functionsf:S→⊔(S)such that for allT∈S,f(T)is of the form(T,t), for somet∈T.We can now give the definition of a factorization of a set.

Definition 10(factorization).A factorization of a setSis a setB⊆Part(S)of nontrivial partitions ofSsuch that the functionπ:S→⊓(B), given byπ(s)=(b↦(b,[s]b)), is a bijection.LetFact(S)denote the set of all factorizations ofS. The elements of a factorization are called factors.In other words, a set of nontrivial partitions is a factorization of S if for each way of choosing one part from each factor, there exists a unique element of S in the intersection of those parts.

Notice the duality between the definitions of partition and factorization. We replace subsets with partitions, nonempty with nontrivial, and disjoint union with Cartesian product, and we reverse the direction of the function. We can think of a factorization of S as a way to view S as a product, in the same way that a partition was a way to view S as a disjoint union.

A factored set is just a set together with a factorization of that set.

Definition 11(factored set).A factored setFis an ordered pair(S,B), such thatBis a factorization ofS.IfF=(S,B)is a factored set, we letset(F)=S, and letbasis(F)=B.An important fact about factored sets is that the factors are enough to distinguish distinct elements.

Proposition 3.Given a factored setF=(S,B), and elementss0,s1∈S, ifs0∼bs1for allb∈B, thens0=s1.Proof.Let F=(S,B) be a finite factored set, and let s0,s1∈S satisfy s0∼bs1 for all b∈B.Let π:S→⊓(B) be given by π(s)=(b↦(b,[s]b)), as in the definition of factorization. Then π(s0)=(b↦(b,[s0]b))=(b↦(b,[s1]b))=π(s1). Since π is bijective, this means s0=s1. □

## 2.3. Chimera Functions

The following theorem can be viewed as an alternate characterization of factorization. We will use this alternate characterization to define chimera functions, which will be useful tools for manipulating elements of factored sets.

Theorem 1.Given a setS, a setBof nontrivial partitions ofSis a factorization ofSif and only if for every functiong:B→S, there exists a uniques∈Ssuch that for allb∈B,s∼bg(b).Proof.First, we let B be a factorization of S, and let g:B→S be any function. We want to show that there exists a unique s∈S such that for all b∈B, s∼bg(b). Let π:S→⊓(B) be given by π(s)=(b↦(b,[s]b)), as in the definition of factorization. Note that π is bijective, and thus has an inverse.Let s=π−1(b↦(b,[g(b)]b)). Observe that this is well-defined, because (b↦(b,[g(b)]b)) is in fact in ⊓(B). We will show that s∼bg(b) for all b∈B, and the uniqueness of this s will then follow directly from Proposition 3.

We have π(s)=(b↦[s]b) by the definition of π. However, we also have π(s)=(b↦[g(b)]b) by the definition of s. Thus, b↦[s]b and b↦[g(b)]b are the same function, so [s]b=[g(b)]b for all b∈B, so s∼bg(b) for all b∈B.

Conversely, let S be any set, and let B be any set of nontrivial partitions of S. Assume that for all g:B→S, there exists a unique s∈S satisfying s∼bg(b) for b∈B. Again, let π:S→⊓(B) be given by π(s)=(b↦(b,[s]b)), as in the definition of factorization. We want to show that π is invertible.

First, we show that π is injective. Take an arbitrary s0∈S, and let g:B→S be the constant function satisfying g(b)=s0 for all b∈B. Given another s1∈S, if π(s0)=π(s1), then (b↦[s0]b)=(b↦[s1]b), so [s1]b=[s0]b=[g(b)]b for all b∈B, so s0∼bs1∼bg(b) for all b∈B. Since there is a unique s∈S satisfying s∼bg(b) for all b∈B, this means s0=s1. Thus π is injective.

To see that π is surjective, consider some arbitrary h∈⊓(B). We want to show that there exists an s∈S with h=π(s).

For all b∈B, let Hb∈b be given by h(b)=(b,Hb), which is well-defined since h∈⊓(B). Note that Hb is a nonempty subset of S, so there exists a function g:B→S with g(b)∈Hb for all b∈B. Fix any such g, and let s satisfy s∼bg(b) for all b∈B.

We thus have that for all b∈B, h(b)=(b,Hb)=(b,[g(b)]b)=(b,[s]b)=π(s)(b), so h=π(s). Thus π is surjective.

Since π is bijective, we have that B is a factorization of S. □

This also gives us that factors are disjoint from each other.

Corollary 1.Given a factored setF=(S,B)and distinct factorsb0,b1∈B,b0∩b1={}.Proof.Assume by way of contradiction that T∈b0∩b1. Since b0 is nontrivial, there must be some other T′∈b0 with T∩T′={}. Let g:B→S be any function such that g(b0)∈T′ and g(b1)∈T. Then there can be no s such that s∼b0g(b0) and s∼b1g(b1), since then s would be in both T and T′. This contradicts Theorem 1. □We are now ready to define the chimera function of a factored set.

Definition 12(chimera function).Given a factored setF=(S,B), the chimera function (ofF) is the functionχF:(B→S)→Sdefined byχF(g)∼bg(b)for allg:B→Sandb∈B.The name "chimera function" comes from the fact that χF can be viewed as building an element of S by fusing together the properties of various different elements. Since we will often apply the chimera function to functions g that only take on two values, we will give notation for this special case.

Definition 13.Given a factored setF=(S,B), and a subsetC⊆B, letχFC:S×S→Sbe given byχFC(s,t)=χF(g), whereg:B→Sis given byg(b)=sifb∈C, andg(b)=totherwise.ForT,R⊆S, we will writeχFC(T,R)for{χFC(t,r)∣t∈T, r∈R}.The following is a list of properties of χFC, which will be useful in later proofs. All of these properties follow directly from the definition of χFC.

Proposition 4.FixF=(S,B), a factored set,C,D⊆B, ands,t,r∈S.for allc∈C.for allb∈B∖C.Proof.Trivial. □## 2.4. Trivial Factorizations

We now define a notion of a trivial factorization of a set, and show that every set has a unique trivial factorization.

Definition 14(trivial factorization).A factorizationBof a setSis called trivial if|B|≤1. A factored set(S,B)is called trivial ifBis trivial.Proposition 5.For every setS, there exists a unique trivial factorizationBofS. If|S|≠1, this trivial factorization is given byB={DisS}, and if|S|=1, it is given byB={}.Proof.We start with the case where |S|=0. The only partition of S is {}, so we only need to consider the sets of partitions {{}} and {} as potential factorizations. {{}} is vacuously a factorization of S by Theorem 1, since there are no functions from {{}} to S. {} is not a factorization by Theorem 1, since there is a function from {} to S, but there is no element of S. Thus, when |S|=0, {{}}={DisS} is the unique trivial factorization of S.Next, consider the case where |S|=1. First, observe that the unique s∈S vacuously satisfies s∼bg(b) for all g:{}→S and b∈{}, since there is no b∈{}. Thus, by Theorem 1, {} is a factorization of S. Further, {} is the only factorization of S, since there are no nontrivial partitions of S. Thus, when |S|=1, {} is the unique trivial factorization of S.

Next, we consider the case where |S|≥2. Observe that DisS is a nontrivial partition of S. Let B={DisS}. We want to show that B is a factorization of S. By Theorem 1, it suffices to show that for all g:B→S, there exists a unique s∈S with s∼DisSg(DisS). We can take s=g(DisS), which clearly satisfies s∼DisSg(DisS). This s is unique, since if s′∼DisSg(DisS), then s′∈[g(DisS)]DisS=[s]b={s}, so s′=s. Thus B is a factorization of S.

On the other hand, if |S|≥2, {} is not a factorization of S, since if it were, Proposition 3 would imply that all elements of S are equal. Further, for any partition b of S, with b≠{DisS}, there must exist s0,s1∈S, with s0∼bs1, but s0≠s1. Thus {b} cannot be a factorization of S by Proposition 3. Thus when |S|≥2, DisS is the unique trivial factorization of S. □

## 2.5. Finite Factored Sets

This sequence will primarily be about finite factored sets.

Definition 15.IfF=(S,B)is a factored set, the size ofF, writtensize(F), is the cardinality ofS. The dimension ofF, writtendim(F), is the cardinality ofB.Fis called finite if its size is finite, and finite-dimensional if its dimension is finite.We suspect that the theory of infinite factored sets is both interesting and important. However, it is outside of the scope of this sequence, which will require finiteness for many of its key results.

Some of the definitions and results in this sequence will be given for finite factored sets, in spite of the fact that they could easily be extended to finite-dimensional or arbitrary factored sets. This is because they can often be extended in more than one way, and determining which extension is most natural requires further developing the theory of arbitrary factored sets.

Proposition 6.Every finite factored set is also finite-dimensional.Proof.If F=(S,B) is a factored set, B is a set of sets of subsets of S. Thus, |B|≤22|S|. □This bound is horrible and will be improved in Proposition 9. First, however, we will take a look at the number of factorizations of a fixed finite set.

Proposition 7.LetF=(S,B)be a finite factored set. Then|S|=∏b∈B|b|.Proof.Trivial. □Proposition 8.If|S|is equal to0,1, or a prime, the trivial factorization ofSis the only factorization ofS.Proof. If |S|=0 or |S|=1, then |Part(S)|=1, so B⊆Part(S) can have cardinality at most 1.If |S|=p, a prime, then by Proposition 7, |b| must divide p for all b∈B. Since factorizations cannot contain trivial partitions, this means |b|=p for all b∈B. However, {{s}∣s∈S} is the only element of Part(S) of cardinality p, so |B|≤1. □

On the other hand, in the case where |S| is finite and composite, the number of factorizations of S grows very quickly, as seen in Table 1. Table 1 shows the number of factorizations of a set S with cardinality up to 25:

Given the naturalness of the notion of factorization, we were surprised to discover that this sequence did not exist on the On-Line Encyclopedia of Integer Sequences (OEIS). We added the sequence, A338681, on April 30, 2021.

To give one concrete example, the four factorizations of the set {0,1,2,3} are:

Proposition 9.LetFbe a finite factored set.Ifsize(F)=0, thendim(F)=1.Ifsize(F)=1, thendim(F)=0.Ifsize(F)=pis prime, thendim(F)=1.Ifsize(F)=p0…pk−1is a product ofk≥2primes, then1≤dim(F)≤k.Proof.The first three parts follow directly from Proposition 5 and Proposition 8. For the fourth part, let F=(S,B), and let |S|=p0…pk−1 be a product of k≥2 primes.By Proposition 7, |S|=∏b∈B|b|. Consider an arbitrary b∈B. Since b is a nontrivial partition of a finite set S, |b| is finite and |b|≠1. If |b| were 0, then |S| would be 0. Thus |b| is a natural number greater than or equal to 2. B cannot be empty, since |S|≠1. If |B| were greater than k, then we would be able to express |S| as a product of more than k natural numbers greater than or equal to 2, which is clearly not possible since |S| is a product of k primes. Thus 1≤dim(F)≤k. □

In the next post, we will introduce the notions of the history of a partition, orthogonality between partitions, and time.

Acknowledgments:My thanks to Alex Appel, Ramana Kumar, Xiaoyu He, Tsvi Benson-Tilsen, Andrew Critch, Sam Eisenstat, Rob Bensinger, and Claire Wang for discussion and feedback on this sequence.## Footnotes

[1] Although I say "factorize" here, note that this will not be the kind of factorization that shows up in finite factored sets, because (as we will see) disjoint factors must be independent in a finite factored set. I appeal to the same concept in both contexts because factorization is just a very general and useful concept, rather than to indicate a direct connection.

[2] At least, it lacks such causal inference tools unless we assume access to interventional data.

[3] Note that this definition and Definition 9 could have been made more general by taking S to be a multiset.

[4] P(S) denotes the power set of S.

[5] In our view, "Y≥X" is also a more natural way to visually represent a mapping between a three-part partition Y that is finer than a two-part partition X.