Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

A followup to Scott's "Finite Factored Sets", specifically the Applications part at the end where he talked about the infinite case.

As it turns out, there's a natural-seeming (at least to my mathematical tastes) way to generalize all the finite factored set stuff to the infinite case.

To be perfectly clear about what is being claimed:

1: It is possible to deal with arbitrary compact metric spaces, instead of just finite products of finite sets, with a minimum of fuss. In particular this means that we can have countably many axes/basis factors/coordinates now!

2: Orthogonality, time, history, and all that remain perfectly well-defined in the infinite-coordinate case, no meaningful differences crop up in comparision to how the finite case works.

4: Of the three ways to implement the extension to the infinite case which were suggested in Scott's post, it is closest in spirit to the second approach, with having the history of X be the intersection of all the sets C⊆B s.t. C⊢X. The proffered counterexample from Scott's post is forbidden by restricting the partitions to "sufficiently nice" ones, which makes everything Just Work.

5: Yes, the semigraphoid axioms, and everything not specifically mentioned, work out, with absolutely no special tricks needed besides the single starting restriction on only using "sufficiently nice" partitions and sets.

6: I was only able to get one direction of the fundamental theorem to work out (the conditional orthogonality to conditional probabilistic independence direction), since the nice attributes of ordinary vanilla polynomials don't last when you start mucking about with uncountable sums of countable products of variables. It's fairly plausible that the missing direction of the fundamental theorem does work out and I'm just not a good enough mathematician to show it, feel free to try it yourself. I suspect it'll require a different proof path than the original, though.

7: Daniel Filan has, completely independently of me, generalized all results in "Finite Factored Sets" to countable sets (note that in that case, there can still only be finitely many factors, while this version can have countably many factors. Daniel's version of things is unconstrained by the compactness restriction that I have, though.) Apparently everything works out perfectly, including the fundamental theorem, although the direction of it which I failed to show was also tricky for them, and TurnTrout contributed several essential pieces of the proof for that.

Let's get started.

Why Nice Partitions? What is "Nice" Anyways?

So, first off: where someone working in combinatorics sees a partition of a set, a topologist sees a quotient space.

If you've got a space S, then a partition X induces a function ∼X:S→X, which maps a point to the unique equivalence class it's in. Then, we just need to stick a topology on the set X, which ends up being "put as many open sets in X as you can while keeping the function ∼X continuous".

Basically, taking a quotient is the process of taking a space and using an equivalence relation to go "let's make a new space where all things in the same equivalence class is treated as the same point". Quotient spaces are generally written as something like S/∼.

So, if you're trying to generalize finite factored sets to the infinite case, and are working with various sorts of partitions, looking at the quotient spaces of the partitions is a very natural thing to do.

The teensy little problem is that taking quotients is just not a very well-behaved operation topologically. Operations like disjoint sum and products are very good at preserving toplogical properties. You take two "nice" spaces, for many definitions of "nice", take a product, and it'll probably be "nice" as well, for many definitions of "nice". Quotients... not so much. If you're dealing with arbitrary quotients, you can whip up some pretty hideous-looking spaces. The natural next question is whether there's some sort of topological property that is preserved by "sufficiently nice" quotients.

To cash out what "sufficiently nice" means, Hausdorffness is a very basic property on topological spaces, that's held by most spaces encountered in typical mathematical practice (type theory excepted), and topology gets a lot stranger if it isn't present. It's "given two distinct points, there are disjoint open neighborhoods around the two points". Basically, for any two distinct points x and y, it should be possible to come up with some notion of "close to x" and "close to y" that are mutually exclusive. So, we could demand that our quotient spaces at least be this nice.

Much nicer than that are compact metrizable spaces. Approximately, any space with a notion of distance, and for all ϵ, you can cover the space with finitely many patches of size ϵ. Well, actually, this isn't true, compactness is stronger than that, but I think it gets the spirit across. Examples of such spaces are finite batches of points, the space of all finite and infinite bitstrings, the space of probability distributions over a 256-dimensional hypersphere, the Mandelbrot set, and the space of all closed subsets of the interval [0,1]. They're extraordinarily nicely-behaved topologically while still managing to cover a healthy chunk of spaces that might be encountered in practice.

And, as it turns out, there is a lovely theorem that, for compact metrizable spaces S, if the quotient space S/∼ happens to be Hausdorff (ie, not terrible), the quotient space will also be compact metrizable (about as nice as possible). Or at least, that's what Math StackExchange says.

The rough idea behind the proof is that quotients of compact spaces are always compact, so that leaves metrizability. Using compactness of S and Hausdorffness of S/∼, it's possible to show the intermediate result that, for every set K⊆S that's a union of equivalence classes and closed, you can take an open neighborhood of K and it will contain a smaller open neighborhood which remains open when shoved through ∼. Using this ability to craft open neighborhoods that remain open even after applying ∼, you can push enough open sets forward into the quotient space S/∼ to show that it's second-countable and regular (the regularity argument was skipped in the linked post but it's not hard to fill in). And now, you can invoke the Urysohn Metrization Theorem to show the quotient space is metrizable.

A nifty related theorem that we won't use anywhere is that every compact metric space arises as a quotient of the space of infinite bitstrings. {0,1}ω is all you need.

So... what happens if we try to make all the stuff from "Finite Factored Sets" work with compact metrizable spaces, and we just restrict the sorts of partitions we're dealing with to the nice ones? (those where the resulting quotient space S/∼X is Hausdorff.) Well then, everything works out exactly as you'd expect and you have to change no important definitions, except one direction of the fundamental theorem gets really dang hard and I gave up on it. The nice behavior of the quotient spaces works miracles to tame the infinite case.

Let's start running through the Finite Factored Sets posts, flagging everywhere which requires special care. For all results not specifically mentioned, assume it works out perfectly with very little effort involved in cleaning things up for the infinite case. As usual, feel free to gloss over proofs if you want, but at least read the discussion and theorem statements.

Countably Factored Spaces: Introduction and Factorization

Not much really changes here, except now we aren't dealing with just a batch of sets, we've got some topological structure on them too. And the permissible quotients have to respect the topology appropriately.

The first differences start showing up around definition 8. Since we aren't permitting just any old partition anymore, we need to put in some work to check that the intersection of nice partitions is a nice partition, especially since there can be uncountably many partitions!

Proposition 1:Given a compact metrizable space S, and a set of partitions C s.t. for all c∈C, S/∼c is Hausdorff, the partition ⋁C, defined by s0∼⋁Cs1↔∀c∈C:s0∼cs1, has the property that S/∼⋁C is Hausdorff.

Proof: Fix two distinct points in S/∼⋁C. These correspond to equivalence classes in S, so we'll write them as [s0]⋁C and [s1]⋁C for some distinguished s0 and s1. From this point on, a superscript of p on an equivalence class like this means we'll be treating it as a point in a quotient space, a superscript of e means we'll be treating it as an subset of S.

Our task is to find open neighborhoods for those two points in S/∼⋁C which don't overlap. Since they're distinct points in the quotient, the corresponding equivalence classes in S are distinct, s0 and s1 are in different equivalence classes. There has to be somec∗∈C s.t. [s0]ec∗≠[s1]ec∗ (because otherwise, s0 and s1 would be in the same equivalence class according to ∼⋁C, which is impossible).

What now? Well, now that we've got a c∗ that thinks that s0 and s1 are noticeably different, make a function ϕ:S/∼⋁C→S/∼c∗ defined as ϕ([s]p⋁C):=[s]pc∗. Basically, you can think of a point in S/∼⋁C aka an equivalence class under ⋁C, as being associated with one equivalence class for each c∈C, by how the intersection equivalence class is defined. And that tells you what point to map it to in S/∼c∗.

Next up, we'll show that ϕ∘∼⋁C=∼c∗. This is easy, we just take some s∈S, and go:

ϕ(∼⋁C(s))=ϕ([s]p⋁C)=[s]pc∗=∼c∗(s)

Done. Alright, now that we've got enough setup out of the way, we can start building our disjoint open neighborhoods. The two points [s0]pc∗ and [s1]pc∗ are distinct in S/∼c∗, because they correspond to the equivalence classes [s0]ec∗ and [s1]ec∗ in S which are distinct. So, they've got some disjoint open neighborhoods O0 and O1, in S/∼c∗, since it's Hausdorff.

We will now let our disjoint open neighborhoods of [s0]p⋁C and [s1]p⋁C be ϕ−1(O0) and ϕ−1(O1). They're disjoint because the preimage of any two disjoint sets is disjoint. They contain the requisite points because ϕ([s0]p⋁C)=[s0]pc∗∈O0, and the same for the other one. And they're open because their preimage under the quotient function ∼⋁C is open. To demonstrate this, we have

∼−1⋁C(ϕ−1(O0))=(ϕ∘∼⋁C)−1(O0)=∼−1c∗(O0)

Because we proved that the functions compose like that, and also, since ∼c∗ is a continuous function S→S/∼c∗, the preimage of an open is an open. And we're done! We got disjoint open neighborhoods for any two distinct points. □

So, this is very nice. You can intersect nicely-behaved partitions however you wish, and it'll still stay as a nicely-behaved partition. There's very strong closure properties.

The next time we hit something nontrivial is around definition 10. A batch of partitions was defined to factorize a set iff the function mapping a point to the tuple of equivalence classes it's in is a bijection. But wait, we have topological structure now! As it turns out, amazingly enough, if that function π:S→∏b∈BS/∼b defined as π(s)=λb.∼b(s) is a bijection, it's actually going to be a homeomorphism! That's the topology version of an isomorphism, it's a continuous function with a continuous inverse, like the mapping back and forth between a donut and a coffee cup. The spaces have identical topological structure and we can consider them as basically the same.

This is quite nifty. Leaving S with whatever topological structure it had originally, and equipping it with the product topology you get from the product of the various quotient spaces you made, are the same thing. So, our restriction on partitions implies the only factorizations available are those of the form "look at your space S and realize it already had the topological structure of a cartesian product from the start, and the quotient functions are just projecting down on the various coordinates."

Since the definition of a finite factored set was essentially a pair of a set, and the batch of quotients/partitions which let you view the set as a product of other sets, this lets us analogously define a countably factored space as a pair of a nice-enough topological space, and the batch of quotients/partitions which let you view the space as a product of other topological spaces.

Definition 1: Countably Factored Space

A countably factored space is a (S,B) pair, where S is a compact metrizable space, and B is a set of partitions ∼b s.t. S/∼b is a compact metrizable space for all b, and the function S→∏b∈BS/∼b given by λs.(λb.∼b(s)) is a bijection (and actually, a homeomorphism, the topological structure is preserved, though this is nonobvious)

Proposition 2: If the function π:λs.(λb.∼b(s)) is a bijection between S (a compact metrizable space) and ∏b∈BS/∼b, with all the S/∼b being compact metrizable, then it's a homeomorphism.

We'll do this by showing that π is continuous, and then using the fact that π is a continuous bijection to show that π−1 is continuous. We'll use x for a point in the product space, and xb for the b'th coordinate.

First up, let's get a form for π−1. The form of the inverse is π−1(x)=⋂b∈B∼−1b(xb). Map a batch of equivalence classes to their intersection. If there were multiple points in that intersection, then applying π would map them to the same point (they're in the same list of equivalence classes), contradicting injectivity of π. And if there were no points in that intersection, then there'd be no point in S with that particular combination of equivalence classes, contradicting surjectivity of π.

To show that π is continuous, take some base open set O for ∏b∈BS/∼b. By how the product topology works, there are finitely many bn and open sets On⊆S/∼bn, s.t. O=O1×O2..×On×∏b∈B/{b1...bn}S/∼b.

By how π−1 is defined, and the fact that the set O factorizes into the product of a bunch of other sets Ob for each of the coordinates, we have

π−1(O)=⋂b∈B∼−1b(Ob)

And then, unpack what the form of the various Ob are.

And bam, we've written the preimage of our base open, through the function π, as a finite intersection of preimages of open sets through continuous functions ∼b. So it's a finite intersection of opens, and so is open. The preimage of all base opens is open.

This shows that π is continuous, because you can write any open in the product space as a union of base opens, and the preimage of a union is the union of the preimages, so it'd be the union of a bunch of open sets, ie, open.

Now that we know π is continuous, we'll show that π−1 is. To do this, we'll show that the preimage of any closed set through π−1 is closed (as that's equivalent to the preimage of any open set being open, ie, continuity.) The inverse of π−1 is π, so we need to show that for any closed set in S, applying π produces a closed set in the product space, and we'll have a continuous inverse.

This holds because, for any closed set C⊆S, since S is a compact metrizable space, C is compact. Also, π(C) is compact, because applying a continuous function to a compact set makes a compact set. And, arbitrary products of Hausdorff spaces (our quotient spaces) are Hausdorff, and in any Hausdorff space, compact sets are closed, so π(C) is closed.

Bam, π−1 is continuous, and since we showed that π is, it's a homeomorphism. □

There's another result that I should mention now, because it's implicitly used in a lot of upcoming arguments, and it justifies the use of the term "countably factored space".

Proposition 3: Any compact metrizable space S can only have countably many nontrivial factors.

Assume this is false, and there are uncountably many nontrivial factors. We can identify our compact metrizable space S with ∏b∈BS/∼b by Proposition 2, there's the same topology. All factors are nontrivial, so each space S/∼b has two distinct points in them. And all the S/∼b are Hausdorff. Since it's an uncountable product, we can call upon the power of MathOverflow to conclude that ∏b∈BS/∼b isn't a first-countable space. But all compact metrizable spaces are first-countable, contradiction. □

Past this point, everything works out precisely as it did in the original post.

We're going to skip proving that history exists the first time around, and all that ordinary orthogonality stuff, because Scott's posts had to run through the proofs a second time in greater generality for conditional everything, so we might as well just show that conditional history exists (and the other associated results), and go "conditioning on the entire space S recovers the ordinary case" so we only have to prove things once.

If a statement is not mentioned here or proved, assume it works out perfectly straightforwardly with no issues and requiring no special tricks. I'll only be focusing on the propositions which take a bit of work to generalize to the infinite case.

There is something important to note in this section. We can't condition on an arbitrary subset of S, it has to be closed! This is the same sort of topological restriction as the one we imposed on the quotients. Any closed subset of a compact metrizable space is compact metrizable too, so it's motivated by the same reasoning as what motivated the restriction on the quotients. It's far far easier to prove this fact though, since the argument for metrizability is just "if you've got a metric on the full space, the subspace inherits that metric". As usual, we'll restrict subpartitions to those which make the quotient space Hausdorff.

Our first order of business is going to be showing that if you've got a permissible partition X, then ∼X restricted to the closed set E is a permissible partition of E. We'll actually show something considerably stronger, that the spaces E/∼X|E and ∼X(E) (as a subspace of S/∼X) are homeomorphic. In other words, it doesn't matter whether you slice E out first and then take a quotient of it, or take the quotient first and slice the image of E out of the quotient space, you'll get the same space. So, in particular, you can go "S/∼X is compact-metrizable, and ∼X(E) happens to be closed if E is, so these two isomorphic spaces are also compact-metrizable". And this means that taking permissible quotients of a subspace works perfectly well and causes no issues.

Proposition 4:For a compact metrizable space S, closed subset E, and partition X s.t. S/∼X is Hausdorff, then E/∼X|E and ∼X(E) are homeomorphic.

Our attempted homeomorphism ϕ:E/∼X|E→∼X(E) will be mapping the point [s]pX|E (where s∈E) to [s]pX.

First off, we've gotta show injectivity. Let's say [s0]pX|E and [s1]pX|E are distinct in E/∼X|E. Then ϕ([s0]pX|E)=[s0]pX, and similar for s1. If these two points were identical, [s0]pX=[s1]pX, then the sets [s0]eX and [s1]eX would be identical in S, and so s0∼Xs1, but s0,s1∈E, so s0∼X|Es1, and so [s0]pX|E=[s1]pX|E, but they're distinct points, contradiction.

Now for surjectivity. Fix some [s]pX∈∼X(E). Then there's some s′∈E where ∼X(s′)=[s]pX. So, in particular, [s′]pX=[s]pX. This point s′, due to being in E, maps to the point [s′]pX|E∈E/∼X|E. Then, we have ϕ([s′]pX|E)=[s′]pX=[s]pX And bam, we found a point that maps onto it. Surjectivity is established. It's a bijection.

The inverse of the bijection maps [s]pX to [s′]pX|E, where s′ is an arbitrary element of E where s′∼Xs (which must exist because [s]pX is taken from the image of E).

Now, let's show continuity of ϕ. Fix an arbitrary open set O in ∼X(E). By how the subspace topology works, O=O′∩∼X(E), where O′ is some open in S/∼X. We will attempt to show that ∼−1X|E(ϕ−1(O))=E∩∼−1X(O′), establishing that the preimage of ϕ−1(O) is open in E equipped with the subspace topology (intersection of E and the preimage of an open, ie, an open), which would show that ϕ−1(O) is open in E/∼X|E by how the quotient topology works, establishing the continuity of ϕ.

So, our proof goal switches to proving ∼−1X|E(ϕ−1(O))=E∩∼−1X(O′)

Let a point s lie in the first set. Then it must be in E, and we must have that ∼X|E(s)∈ϕ−1(O). Rewrite this a little bit as [s]pX|E∈ϕ−1(O), and then, by applying ϕ to both sides, we have [s]pX∈O⊆O′, establishing that s∈∼−1X(O′). So, we have one subset inclusion direction, ∼−1X|E(ϕ−1(O))⊆E∩∼−1X(O′)

For the other subset inclusion direction, let a point s lie in E, and applying ∼X, we have [s]pX∈O′. In particular, [s]pX∈∼X(E), so we have [s]pX∈O.

Now, for this s, since it's in E, ϕ(∼X|E(s))=ϕ([s]pX|E)=[s]pX∈O. So, it lies in the composition of preimages on the left side, and we have equality, which, by previous arguments, establishes that ϕ is continuous.

Now, we'll show that E/∼X|E is Hausdorff. Take two distinct points in it, shove them through ϕ to get two distinct points in ∼X(E), use Hausdorffness of∼X(E) (which happens because E is closed and so compact, and ∼X is continuous, so ∼X(E) is compact, and S/∼X is compact metrizable, so ∼X(E) is a closed subset of a compact metrizable space and so is compact metrizable, and thus Hausdorff) to fit two disjoint open neighborhoods around your two points, then use continuity of ϕ to show that ϕ−1 pulls those disjoint open neighborhoods back to make disjoint open neighborhoods of your two original points in E/∼X|E.

Hm, E/∼X|E is a Hausdorff quotient of a compact metrizable space (E) and so is compact metrizable. So, in particular, any closed set C must be compact, and using continuity of ϕ, ϕ(C) is compact, and by Hausdorffness of ∼X(E), ϕ(C) is closed. So, ϕ maps closed sets to closed sets. This is equivalent to the preimage of a closed set being closed, according to the function ϕ−1. So, ϕ−1 is continuous too, and we have a homeomorphism. □

Alright, where to go from here? Well, things go perfectly fine up until you hit Proposition 21.5, the "conditional history exists" result. We'll need to strengthen it to arbitrary unions and intersections for things from here on out to work properly. In particular, it means that you can just intersect all the sets of coordinates C where C⊢X, to get a unique minimal set of coordinates that generates X, and bam, that's a history, but for the infinite case. Perfectly well-defined, no problems whatsoever.

Proposition 5 (Reproof of Proposition 21.5, Infinite History Remix:)If X is a permissible subpartition, and there's a bunch of sets Ci⊆B where ∀i:Ci⊢X, then ⋂iCi⊢X, and ⋃iCi⊢X.

The proofs of these two results are extremely similar, just flipped around, so we'll provide the general proof framework for both cases.

Step 1 is to show it for the intersection of two sets of coordinates, or the union of two sets of coordinates. Step 2 is to show it for the intersection of a descending sequence of sets of coordinates, or the union of an ascending sequence of sets of coordinates. Step 3 is to use steps 1 and 2 and "there's only countably many coordinates" to prove the whole thing.

The proof of Step 1 perfectly follows the way it works in Scott's post, there's no meaningful differences going on, the exact same argument works.

For the proof of step 2, we'll given the argument for intersection (and for union in parentheses). Fix a sequence of sets Cn s.t. C0⊇C1⊇C2... (or, for union, have ⊆ instead).

Now, for any particular coordinate b, if b∈⋂nCn (for union, b∉⋃nCn), then it'll be in all the Cn (none of the Cn),

so for intersections we have (χ⋂nCn(s,t))b=sb=limn→∞(χCn(s,t))b

and for unions we have (χ⋃nCn(s,t))b=tb=limn→∞(χCn(s,t))b

now, if b∉⋂nCn (for union, b∈⋃nCn), then there's some finite n where Cn excludes b (includes b), and then it never returns after that (always is present after that) because the Cn get smaller (larger) as n increases. So, again, for b like that,

for intersections, we have (χ⋂nCn(s,t))b=tb=limn→∞(χCn(s,t))b

and for unions we have (χ⋃nCn(s,t))b=sb=limn→∞(χCn(s,t))b

So, since we have convergence in each individual coordinate, and there's only countably many coordinates, this means that χ⋂nCn(s,t)=limn→∞χCn(s,t) (for union, just switch the intersection to a union).

Now, since all the Cn⊢X, this means that all the χCn(s,t) will land in [s]X, which is a closed set (single points are closed in Hausdorff spaces, the preimage of a closed set through a continuous function is closed, so [s]X is closed). Thus, the limit point will also land in [s]X, establishing that ⋂nCn⊢X (or ⋃nCn⊢X)

Now for step 3. Again, the union argument is in parentheses. Given your collection of sets Ci, index the coordinates by N, there's only countably many coordinates. For coordinate n, if n∈⋂iCi (or n∉⋃iCi), then let Cn be whatever set Ci you want. If n∉⋂iCi (or n∈⋃iCi), then let Cn be some Ci which excludes (includes) the coordinate n, which must exist. Now, we can go

⋂i∈ICi=⋂n∈NCn=⋂n∈N⋂m≤nCm

(or for union) ⋃i∈ICi=⋃n∈NCn=⋃n∈N⋃m≤nCm

Basically, since our Cn were picked to exclude (include) every coordinate it's possible to exclude (include), we can rewrite our big intersection (union) as a countable intersection (union). Then just rewrite a bit.

Now, for each n, we have ⋂m≤nCm⊢X (or ⋃m≤nCm⊢X) because we proved the finite intersection (union) case and can use induction. And the sequence C0,C0∩C1,C0∩C1∩C2... (same thing but with ∪) is a descending (ascending) sequence of sets, so we can apply our proof from that case to establish that ⋂i∈ICi=⋂n∈N⋂m≤nCm⊢X (or ⋃i∈ICi=⋃n∈N⋃m≤nCm⊢X) and we're done! □

Ok, now that that's taken care of... is there anything else on the list that may be particularly difficult? Well, our next spot of mild trouble is in Proposition 23, specifically, the part about showing that the history of a supremum of partitions is the union of the histories of the component parts.

Proposition 6 (Reproof of Proposition 23.2):Given Xi, a collection of subpartitions all with the same domain, we have h(⋁Xi)=⋃i∈Ih(Xi).

So, one direction of this, that ∀j:h(⋁Xi)⊢Xj to show that the history of the supremum is as larger or larger than the union of all the other histories, is pretty easy. We have that ⋁Xi≥EXj for all j, so the history of ⋁Xi manages to generate Xj, for all j, so all the h(Xi) are a subset of the history of ⋁Xi, establishing that h(⋁Xi)⊇⋃i∈Ih(Xi).

The other subset direction, which will be established by showing ⋃i∈Ih(Xi)⊢⋁Xi, requires taking a bit more care. Our first order of business is showing that ⋁Xi≤E⋃i∈Ih(Xi)|E. Assume that two points, s and t, both in E, fulfill s∼⋃i∈Ih(Xi)t. Then, we have ∀i∈I:s∼h(Xi)t. And since h(Xi)⊢Xi for all i, and s,t are both in E, this implies ∀i∈I:s∼Xit. Which is equivalent to s∼⋁Xit. And so, we have established that inequality.

But that isn't enough, the last piece we need to conclude the argument is that χ⋃i∈Ih(Xi)(E,E)=E. We'll do this by showing that for arbitrary s,t∈E, that χ⋃i∈Ih(Xi)(s,t)∈E. First off, we can go

The first inequality was the same sort of ascending/descending countable chain argument used to make a sequence of ever-larger sets in Proposition 6. Then, we just use the usual limit argument with that of "each coordinate individually converges, so we have overall convergence". Now, if we knew that each χ⋃m≤nh(Xm)(s,t) was in E, then by closure of E, we'd have that our desired point lands in E. So, we just have to show these finite stages land in E. We do this with an induction proof. Clearly, χh(X0)(s,t)∈E, because h(X0)⊢X0, so χh(X0)(E,E)=E. For the induction step, we go

χ⋃m≤n+1h(Xm)(s,t)=χh(Xn+1)(s,χ⋃m≤nh(Xm)(s,t))∈E

Where that last inclusion is because, by induction assumption, χ⋃m≤nh(Xm)(s,t)∈E, and also h(Xn+1)⊢Xn+1.

And so, we have that χ⋃i∈Ih(Xi)(E,E)=E, and so we've established that ⋃i∈Ih(Xi)⊢⋁Xi, establishing the other subset inclusion direction, and thus equality. □

Alright, where is this heading? Well, the semigraphoid axioms, of course. The only roadblock left is that we have to reprove Lemma 2 from scratch. The old proof no longer suffices due to using impermissible partitions, and the old proof path cannot be repaired.

Proposition 7 (reproof of Lemma 2):Let X,Y be subpartitions, and E be their domain. Then h(X∨Y)=h(X)∪⋃x∈Xh(Y|x).

Proof. Since X≤EX∨EY, we have h(X)⊆h(X∨Y). Symmetrically, for all x∈X, since Y|x⊆X∨Y, we have h(Y|x)⊆h(X∨Y) by Proposition 23 in Scott's paper. Thus, h(X∨Y)⊇h(X)∪⋃x∈Xh(Y|x). One direction down, one to go.

In order to begin attacking the reverse direction, our first order of business is taking a detour to establish that h(X)∪⋃x∈Xh(Y|x)⊢IndE, the indiscrete partition of E. In order to do this, we'll show that for any particular x∈X, h(X)∪h(Y|x)⊢IndE. Then, just yeet Proposition 5 (from this post) at that to show our desired result, that you can union everything together and it'll still generate IndE.

So, let x be arbitrary in X, and r be an arbitrary point in x, and s,t be arbitrary points in E.

And both χh(X)(r,s) and χh(X)(r,t) land in [r]X=x, because h(X)⊢X and r,s,t are all in E.

Accordingly, χh(Y|x)(χh(X)(r,s),χh(X)(r,t))∈[χh(X)(r,s)]Y⊆E, because h(Y|x)⊢Y|x, and both of the chi thingies land in x. And then, since both s and the long χh(Y|x) term land in E, χh(X)(s,stuff) lands in E. And so, since s and t were arbitrary in E, we have established that χh(X)∪h(Y|x)(E,E)=E, which is sufficient by itself to establish that h(X)∪h(Y|x)⊢IndE for our arbitrary x. And then, by Proposition 5, h(X)∪⋃x∈Xh(Y|x)⊢IndE.

At this point, we can back out of the lemma and resume our work of showing that this batch of histories is capable of generating X∨EY.

We'll be using the definition of generation where we try to show that if s and t lie in the appropriate set (E in this case), then chimera-ing them together lands in the same equivalence class as s. We have

Now, since we had previously derived that h(X)∪⋃x∈Xh(Y|x)⊢IndE, that means that χh(X)∪⋃x∈Xh(Y|x)(s,t)∈E. Moving out one layer, χh(X)(s,stuff) has both components landing in E, and h(X)⊢X, so that result will land in [s]X.

Moving out another layer from that, χh(Y,[s]X)(s,stuff) has both components landing in [s]X, and h(Y|[s]X)⊢Y|[s]X, so the result will land in [s]Y|[s]X, aka [s]Y∩[s]X, aka [s]X∨Y. And we're done! Our collection of histories generates X∨Y. □

But of course, all of this was just a warmup for showing the semigraphoid axioms, which... go suprisingly smoothly. Most of the aggrivation was concentrated in reproving Lemma 2.

Countably Factored Spaces: Polynomials and Probability

And then things go poorly right around here. Or at least, there's a big difficulty spike if you're trying to rescue all results. Apparently, for Daniel's "countable sets" case, since there can only be finitely many coordinates, you only have to deal with power series (generalization of polynomials) involving sums of countably many terms of finitely many variables each, which is a little tricky, but doable, and TurnTrout did it.

But here, when I tried, I wound up dealing with sums of uncountably many terms, consisting of countably many variables each. Which, apparently, don't work well at all. Now, an uncountable sum over infinitesimally small things is kinda like an integral, or taking the union of a bunch of points to make a set, and as it turns out, a whole bunch of the results in this section have analogues if you swap out the polynomial of a set for the set itself.

With this reframing, Proposition 26 becomes trivial, and Proposition 27 turns into the statement that chimera-ing sets together can be viewed as projecting them down to the appropriate coordinates, and then taking the product of those.

Let's use SC as an abbreviation for ∏b∈CS/∼b, basically, the space you get when you project down to the factors in C. Put another way, Proposition 27 is basically saying that, when C0 and C1 are disjoint sets of coordinates, then

prSC0∪C1(χC0(E0,E1))=prSC0(E0)×prSC1(E1)

This can be trivially verified from understanding what the chimera function does. prSC0(E0) are the possible coordinate values for the coordinates in C0 which are available for use by the chimera function, and prB/SC0(E1) are the possible coordinate values for the coordinates in B/SC0 which are available for use by the chimera function, and the chimera function can put these together however it wishes, making the set prSC0(E0)×prB/SC0(E1). Project out some extra coordinates, and you get the result.

Proposition 28 turns into the statement that, given a set E⊆S that factorizes as F×G, you can just let C be the set of relevant coordinates for the set F, and write F and G as prSC(E) and prSB/C(E).

Propositions 29 and 30 turn into the statement that given some set E, there's a way to split it into as many factors as possible, until eventually you hit a batch of "irreducible pieces", sets of coordinates C where prC(E) just can't be factorized any further, it's an odd shape, and that these sets of coordinates partition B, and multiplying these things together makes E again. The rough proof path for this is showing that given any way of factorizing E in two different ways using coordinates (C0,B/C0) in one way, and (C1,B/C1) in another way, then (C0∩C1,B/(C0∩C1)) is also a factorization of the set. Then we do our usual sort of argument with the countable descending chains to get that for any particular coordinate b, there's a minimal set of coordinates including b that don't let you factor the set any more.

It's slightly tricky to show, but Lemma 3 carries over and shows an equivalence between X⊥Y|Z, and the statement "for all x,y, and z, if you decompose x∩z and y∩z into irreducible pieces (be very sure to label all pieces with the coordinates they're pinning down, to make the pieces distinguishable), and throw them into a multiset, it'll be the same as decomposing z and x∩y∩z into irreducible pieces and throwing those pieces into a multiset"

But, sadly, when we go to the Fundamental Theorem of Finite Factored Sets, it's just too hard (for me personally, you may be different) to show that probabilistic independence implies two multisets of irreducible pieces (labeled with coordinates) are identical.

But, it is pretty easy to get the other direction, where conditional orthogonality implies conditional independence. Well, kinda. I'd ideally like to strengthen it to talk about conditional probabilities (if it's very improbable to select any particular event z), but I only proved the unconditional probabilities version, though a slightly stronger version than was originally stated.

Proposition 7 (reproof of one direction of the fundamental theorem): Given any factorized probability distribution μ, permissible partitions X,Y,Z s.t. X⊥Y|Z, subsets X′⊆X and Y′⊆Y, and z∈Z, then

Proof: Since X⊥Y|Z, this means that for all z∈Z, we have h(X|z)∩h(Y|z)=∅. Use C to abbreviate h(X|z). Now, clearly, by the definition of ⊢, we have χC(z,z)=z, so we also have χB/C(z,z)=z. Also, h(Y|z)⊆B/C, so Y|z≤zh(Y|z)|z≤z⋁(B/C)|z and so this means that B/C⊢Y|z.

Since C⊢X|z (remember, it's h(X|z)), we have that for all x∈X, χC(x∩z,z)=x∩z. Rephrasing this somewhat, it's saying prSC(x∩z)×prSB/C(z)=x∩z for all x. Accordingly, we have

But, hang on, those sets we're chimera-ing together are projections of sets as large or larger than the intersection of X′,Y′, and z, so we must have equality there.

Accordingly, we can use that factorization and get

⋃x∈X′x∩⋃y∈Y′y∩z=prSC(⋃x∈X′x∩z)×prSB/C(⋃y∈Y′y∩z)

and also, since we had χC(z,z)=z, this means that z=prSC(z)×prSB/C(z)

And so, throwing probabilities in the mix, we have

Note that the title is misleading. This is really countable dimension factored spaces, which is much better, since it allows for the possibility of something kind of like continuous time, where between any two points in time, you can specify a time strictly between them.

A followup to Scott's "Finite Factored Sets", specifically the Applications part at the end where he talked about the infinite case.

As it turns out, there's a natural-seeming (at least to my mathematical tastes) way to generalize all the finite factored set stuff to the infinite case.

To be perfectly clear about what is being claimed:

1: It is possible to deal with arbitrary compact metric spaces, instead of just finite products of finite sets, with a minimum of fuss. In particular this means that we can have countably many axes/basis factors/coordinates now!

2: Orthogonality, time, history, and all that remain perfectly well-defined in the infinite-coordinate case, no meaningful differences crop up in comparision to how the finite case works.

4: Of the three ways to implement the extension to the infinite case which were suggested in Scott's post, it is closest in spirit to the second approach, with having the history of X be the intersection of all the sets C⊆B s.t. C⊢X. The proffered counterexample from Scott's post is forbidden by restricting the partitions to "sufficiently nice" ones, which makes everything Just Work.

5: Yes, the semigraphoid axioms, and everything not specifically mentioned, work out, with absolutely no special tricks needed besides the single starting restriction on only using "sufficiently nice" partitions and sets.

6: I was only able to get one direction of the fundamental theorem to work out (the conditional orthogonality to conditional probabilistic independence direction), since the nice attributes of ordinary vanilla polynomials don't last when you start mucking about with uncountable sums of countable products of variables. It's fairly plausible that the missing direction of the fundamental theorem does work out and I'm just not a good enough mathematician to show it, feel free to try it yourself. I suspect it'll require a different proof path than the original, though.

7: Daniel Filan has, completely independently of me, generalized all results in "Finite Factored Sets" to countable sets (note that in that case, there can still only be finitely many factors, while this version can have countably many factors. Daniel's version of things is unconstrained by the compactness restriction that I have, though.) Apparently everything works out perfectly, including the fundamental theorem, although the direction of it which I failed to show was also tricky for them, and TurnTrout contributed several essential pieces of the proof for that.

Let's get started.

Why Nice Partitions? What is "Nice" Anyways?So, first off: where someone working in combinatorics sees a partition of a set, a topologist sees a quotient space.

If you've got a space S, then a partition X induces a function ∼X:S→X, which maps a point to the unique equivalence class it's in. Then, we just need to stick a topology on the set X, which ends up being "put as many open sets in X as you can while keeping the function ∼X continuous".

Basically, taking a quotient is the process of taking a space and using an equivalence relation to go "let's make a new space where all things in the same equivalence class is treated as the same point". Quotient spaces are generally written as something like S/∼.

So, if you're trying to generalize finite factored sets to the infinite case, and are working with various sorts of partitions, looking at the quotient spaces of the partitions is a very natural thing to do.

The teensy little problem is that taking quotients is just not a very well-behaved operation topologically. Operations like disjoint sum and products are very good at preserving toplogical properties. You take two "nice" spaces, for many definitions of "nice", take a product, and it'll probably be "nice" as well, for many definitions of "nice". Quotients... not so much. If you're dealing with arbitrary quotients, you can whip up some pretty hideous-looking spaces. The natural next question is whether there's some sort of topological property that is preserved by "sufficiently nice" quotients.

To cash out what "sufficiently nice" means, Hausdorffness is a very basic property on topological spaces, that's held by most spaces encountered in typical mathematical practice (type theory excepted), and topology gets a lot stranger if it isn't present. It's "given two distinct points, there are disjoint open neighborhoods around the two points". Basically, for any two distinct points x and y, it should be possible to come up with

somenotion of "close to x" and "close to y" that are mutually exclusive. So, we could demand that our quotient spaces at least be this nice.Much nicer than that are compact metrizable spaces. Approximately, any space with a notion of distance, and for all ϵ, you can cover the space with finitely many patches of size ϵ. Well, actually, this isn't true, compactness is stronger than that, but I think it gets the spirit across. Examples of such spaces are finite batches of points, the space of all finite and infinite bitstrings, the space of probability distributions over a 256-dimensional hypersphere, the Mandelbrot set, and the space of all closed subsets of the interval [0,1]. They're extraordinarily nicely-behaved topologically while still managing to cover a healthy chunk of spaces that might be encountered in practice.

And, as it turns out, there is a

lovelytheorem that, for compact metrizable spaces S, if the quotient space S/∼ happens to be Hausdorff (ie, not terrible), the quotient space will also be compact metrizable (about as nice as possible). Or at least, that's what Math StackExchange says.The rough idea behind the proof is that quotients of compact spaces are always compact, so that leaves metrizability. Using compactness of S and Hausdorffness of S/∼, it's possible to show the intermediate result that, for every set K⊆S that's a union of equivalence classes and closed, you can take an open neighborhood of K and it will contain a smaller open neighborhood which

remainsopen when shoved through ∼. Using this ability to craft open neighborhoods that remain open even after applying ∼, you can push enough open sets forward into the quotient space S/∼ to show that it's second-countable and regular (the regularity argument was skipped in the linked post but it's not hard to fill in). And now, you can invoke the Urysohn Metrization Theorem to show the quotient space is metrizable.A nifty related theorem that we won't use anywhere is that every compact metric space arises as a quotient of the space of infinite bitstrings. {0,1}ω is all you need.

So... what happens if we try to make all the stuff from "Finite Factored Sets" work with compact metrizable spaces, and we just restrict the sorts of partitions we're dealing with to the nice ones? (those where the resulting quotient space S/∼X is Hausdorff.) Well then, everything works out exactly as you'd expect and you have to change no important definitions, except one direction of the fundamental theorem gets really dang hard and I gave up on it. The nice behavior of the quotient spaces works miracles to tame the infinite case.

Let's start running through the Finite Factored Sets posts, flagging everywhere which requires special care. For all results not specifically mentioned, assume it works out perfectly with very little effort involved in cleaning things up for the infinite case. As usual, feel free to gloss over proofs if you want, but at least read the discussion and theorem statements.

Countably Factored Spaces: Introduction and FactorizationNot much really changes here, except now we aren't dealing with just a batch of sets, we've got some topological structure on them too. And the permissible quotients have to respect the topology appropriately.

The first differences start showing up around definition 8. Since we aren't permitting just any old partition anymore, we need to put in some work to check that the intersection of nice partitions is a nice partition, especially since there can be uncountably many partitions!

Proposition 1:Given a compact metrizable spaceS, and a set of partitionsCs.t. for allc∈C,S/∼cis Hausdorff, the partition⋁C, defined bys0∼⋁Cs1↔∀c∈C:s0∼cs1, has the property thatS/∼⋁Cis Hausdorff.Proof: Fix two distinct points in S/∼⋁C. These correspond to equivalence classes in S, so we'll write them as [s0]⋁C and [s1]⋁C for some distinguished s0 and s1. From this point on, a superscript of p on an equivalence class like this means we'll be treating it as a point in a quotient space, a superscript of e means we'll be treating it as an subset of S.

Our task is to find open neighborhoods for those two points in S/∼⋁C which don't overlap. Since they're distinct points in the quotient, the corresponding equivalence classes in S are distinct, s0 and s1 are in different equivalence classes. There has to be

somec∗∈C s.t. [s0]ec∗≠[s1]ec∗ (because otherwise, s0 and s1 would be in the same equivalence class according to ∼⋁C, which is impossible).What now? Well, now that we've got a c∗ that thinks that s0 and s1 are noticeably different, make a function ϕ:S/∼⋁C→S/∼c∗ defined as ϕ([s]p⋁C):=[s]pc∗. Basically, you can think of a point in S/∼⋁C aka an equivalence class under ⋁C, as being associated with one equivalence class for each c∈C, by how the intersection equivalence class is defined. And that tells you what point to map it to in S/∼c∗.

Next up, we'll show that ϕ∘∼⋁C=∼c∗. This is easy, we just take some s∈S, and go:

ϕ(∼⋁C(s))=ϕ([s]p⋁C)=[s]pc∗=∼c∗(s)

Done. Alright, now that we've got enough setup out of the way, we can start building our disjoint open neighborhoods. The two points [s0]pc∗ and [s1]pc∗ are distinct in S/∼c∗, because they correspond to the equivalence classes [s0]ec∗ and [s1]ec∗ in S which are distinct. So, they've got some disjoint open neighborhoods O0 and O1, in S/∼c∗, since it's Hausdorff.

We will now let our disjoint open neighborhoods of [s0]p⋁C and [s1]p⋁C be ϕ−1(O0) and ϕ−1(O1). They're disjoint because the preimage of any two disjoint sets is disjoint. They contain the requisite points because ϕ([s0]p⋁C)=[s0]pc∗∈O0, and the same for the other one. And they're open because their preimage under the quotient function ∼⋁C is open. To demonstrate this, we have

∼−1⋁C(ϕ−1(O0))=(ϕ∘∼⋁C)−1(O0)=∼−1c∗(O0)

Because we proved that the functions compose like that, and also, since ∼c∗ is a continuous function S→S/∼c∗, the preimage of an open is an open. And we're done! We got disjoint open neighborhoods for any two distinct points. □

So, this is very nice. You can intersect nicely-behaved partitions however you wish, and it'll still stay as a nicely-behaved partition. There's very strong closure properties.

The next time we hit something nontrivial is around definition 10. A batch of partitions was defined to factorize a set iff the function mapping a point to the tuple of equivalence classes it's in is a bijection. But wait, we have topological structure now! As it turns out, amazingly enough, if that function π:S→∏b∈BS/∼b defined as π(s)=λb.∼b(s) is a bijection, it's actually going to be a homeomorphism! That's the topology version of an isomorphism, it's a continuous function with a continuous inverse, like the mapping back and forth between a donut and a coffee cup. The spaces have identical topological structure and we can consider them as basically the same.

This is quite nifty. Leaving S with whatever topological structure it had originally, and equipping it with the product topology you get from the product of the various quotient spaces you made, are the

same thing. So, our restriction on partitions implies theonlyfactorizations available are those of the form "look at your space S and realize italreadyhad the topological structure of a cartesian product from the start, and the quotient functions are just projecting down on the various coordinates."Since the definition of a finite factored set was essentially a pair of a set, and the batch of quotients/partitions which let you view the set as a product of other sets, this lets us analogously define a countably factored space as a pair of a nice-enough topological space, and the batch of quotients/partitions which let you view the space as a product of other topological spaces.

Definition 1: Countably Factored SpaceA countably factored space is a(S,B)pair, whereSis a compact metrizable space, andBis a set of partitions∼bs.t.S/∼bis a compact metrizable space for allb, and the functionS→∏b∈BS/∼bgiven byλs.(λb.∼b(s))is a bijection (and actually, a homeomorphism, the topological structure is preserved, though this is nonobvious)Proposition 2:If the functionπ:λs.(λb.∼b(s))is a bijection betweenS(a compact metrizable space) and∏b∈BS/∼b, with all theS/∼bbeing compact metrizable, then it's a homeomorphism.We'll do this by showing that π is continuous, and then using the fact that π is a continuous bijection to show that π−1 is continuous. We'll use x for a point in the product space, and xb for the b'th coordinate.

First up, let's get a form for π−1. The form of the inverse is π−1(x)=⋂b∈B∼−1b(xb). Map a batch of equivalence classes to their intersection. If there were multiple points in that intersection, then applying π would map them to the same point (they're in the same list of equivalence classes), contradicting injectivity of π. And if there were no points in that intersection, then there'd be no point in S with that particular combination of equivalence classes, contradicting surjectivity of π.

To show that π is continuous, take some base open set O for ∏b∈BS/∼b. By how the product topology works, there are finitely many bn and open sets On⊆S/∼bn, s.t. O=O1×O2..×On×∏b∈B/{b1...bn}S/∼b.

By how π−1 is defined, and the fact that the set O factorizes into the product of a bunch of other sets Ob for each of the coordinates, we have

π−1(O)=⋂b∈B∼−1b(Ob)

And then, unpack what the form of the various Ob are.

=⋂b∈{b1...bn}∼−1b(On)∩⋂b∈B/{b1...bn}∼−1b(S/∼b)

=⋂b∈{b1...bn}∼−1b(On)∩⋂b∈B/{b1...bn}S=⋂b∈{b1...bn}∼−1b(On)

And bam, we've written the preimage of our base open, through the function π, as a finite intersection of preimages of open sets through continuous functions ∼b. So it's a finite intersection of opens, and so is open. The preimage of all base opens is open.

This shows that π is continuous, because you can write any open in the product space as a union of base opens, and the preimage of a union is the union of the preimages, so it'd be the union of a bunch of open sets, ie, open.

Now that we know π is continuous, we'll show that π−1 is. To do this, we'll show that the preimage of any closed set through π−1 is closed (as that's equivalent to the preimage of any open set being open, ie, continuity.) The inverse of π−1 is π, so we need to show that for any closed set in S, applying π produces a closed set in the product space, and we'll have a continuous inverse.

This holds because, for any closed set C⊆S, since S is a compact metrizable space, C is compact. Also, π(C) is compact, because applying a continuous function to a compact set makes a compact set. And, arbitrary products of Hausdorff spaces (our quotient spaces) are Hausdorff, and in any Hausdorff space, compact sets are closed, so π(C) is closed.

Bam, π−1 is continuous, and since we showed that π is, it's a homeomorphism. □

There's another result that I should mention now, because it's implicitly used in a lot of upcoming arguments, and it justifies the use of the term "countably factored space".

Proposition 3:Any compact metrizable spaceScan only have countably many nontrivial factors.Assume this is false, and there are uncountably many nontrivial factors. We can identify our compact metrizable space S with ∏b∈BS/∼b by Proposition 2, there's the same topology. All factors are nontrivial, so each space S/∼b has two distinct points in them. And all the S/∼b are Hausdorff. Since it's an uncountable product, we can call upon the power of MathOverflow to conclude that ∏b∈BS/∼b isn't a first-countable space. But all compact metrizable spaces are first-countable, contradiction. □

Past this point, everything works out precisely as it did in the original post.

Countably Factored Spaces: Conditional OrthogonalityWe're going to skip proving that history exists the first time around, and all that ordinary orthogonality stuff, because Scott's posts had to run through the proofs a second time in greater generality for conditional everything, so we might as well just show that conditional history exists (and the other associated results), and go "conditioning on the entire space S recovers the ordinary case" so we only have to prove things once.

If a statement is not mentioned here or proved, assume it works out perfectly straightforwardly with no issues and requiring no special tricks. I'll only be focusing on the propositions which take a bit of work to generalize to the infinite case.

There is something important to note in this section. We can't condition on an arbitrary subset of S, it has to be closed! This is the same sort of topological restriction as the one we imposed on the quotients. Any closed subset of a compact metrizable space is compact metrizable too, so it's motivated by the same reasoning as what motivated the restriction on the quotients. It's far far easier to prove this fact though, since the argument for metrizability is just "if you've got a metric on the full space, the subspace inherits that metric". As usual, we'll restrict subpartitions to those which make the quotient space Hausdorff.

Our first order of business is going to be showing that if you've got a permissible partition X, then ∼X restricted to the closed set E is a permissible partition of E. We'll actually show something considerably stronger, that the spaces E/∼X|E and ∼X(E) (as a subspace of S/∼X) are homeomorphic. In other words, it doesn't matter whether you slice E out first and then take a quotient of it, or take the quotient first and slice the image of E out of the quotient space, you'll get the same space. So, in particular, you can go "S/∼X is compact-metrizable, and ∼X(E) happens to be closed if E is, so these two isomorphic spaces are also compact-metrizable". And this means that taking permissible quotients of a subspace works perfectly well and causes no issues.

Proposition 4:For a compact metrizable spaceS, closed subsetE, and partitionXs.t.S/∼Xis Hausdorff, thenE/∼X|Eand∼X(E)are homeomorphic.Our attempted homeomorphism ϕ:E/∼X|E→∼X(E) will be mapping the point [s]pX|E (where s∈E) to [s]pX.

First off, we've gotta show injectivity. Let's say [s0]pX|E and [s1]pX|E are distinct in E/∼X|E. Then ϕ([s0]pX|E)=[s0]pX, and similar for s1. If these two points were identical, [s0]pX=[s1]pX, then the sets [s0]eX and [s1]eX would be identical in S, and so s0∼Xs1, but s0,s1∈E, so s0∼X|Es1, and so [s0]pX|E=[s1]pX|E, but they're distinct points, contradiction.

Now for surjectivity. Fix some [s]pX∈∼X(E). Then there's some s′∈E where ∼X(s′)=[s]pX. So, in particular, [s′]pX=[s]pX. This point s′, due to being in E, maps to the point [s′]pX|E∈E/∼X|E. Then, we have ϕ([s′]pX|E)=[s′]pX=[s]pX And bam, we found a point that maps onto it. Surjectivity is established. It's a bijection.

The inverse of the bijection maps [s]pX to [s′]pX|E, where s′ is an arbitrary element of E where s′∼Xs (which must exist because [s]pX is taken from the image of E).

Now, let's show continuity of ϕ. Fix an arbitrary open set O in ∼X(E). By how the subspace topology works, O=O′∩∼X(E), where O′ is some open in S/∼X. We will attempt to show that ∼−1X|E(ϕ−1(O))=E∩∼−1X(O′), establishing that the preimage of ϕ−1(O) is open in E equipped with the subspace topology (intersection of E and the preimage of an open, ie, an open), which would show that ϕ−1(O) is open in E/∼X|E by how the quotient topology works, establishing the continuity of ϕ.

So, our proof goal switches to proving ∼−1X|E(ϕ−1(O))=E∩∼−1X(O′)

Let a point s lie in the first set. Then it must be in E, and we must have that ∼X|E(s)∈ϕ−1(O). Rewrite this a little bit as [s]pX|E∈ϕ−1(O), and then, by applying ϕ to both sides, we have [s]pX∈O⊆O′, establishing that s∈∼−1X(O′). So, we have one subset inclusion direction, ∼−1X|E(ϕ−1(O))⊆E∩∼−1X(O′)

For the other subset inclusion direction, let a point s lie in E, and applying ∼X, we have [s]pX∈O′. In particular, [s]pX∈∼X(E), so we have [s]pX∈O.

Now, for this s, since it's in E, ϕ(∼X|E(s))=ϕ([s]pX|E)=[s]pX∈O. So, it lies in the composition of preimages on the left side, and we have equality, which, by previous arguments, establishes that ϕ is continuous.

Now, we'll show that E/∼X|E is Hausdorff. Take two distinct points in it, shove them through ϕ to get two distinct points in ∼X(E), use Hausdorffness of∼X(E) (which happens because E is closed and so compact, and ∼X is continuous, so ∼X(E) is compact, and S/∼X is compact metrizable, so ∼X(E) is a closed subset of a compact metrizable space and so is compact metrizable, and thus Hausdorff) to fit two disjoint open neighborhoods around your two points, then use continuity of ϕ to show that ϕ−1 pulls those disjoint open neighborhoods back to make disjoint open neighborhoods of your two original points in E/∼X|E.

Hm, E/∼X|E is a Hausdorff quotient of a compact metrizable space (E) and so is compact metrizable. So, in particular, any closed set C must be compact, and using continuity of ϕ, ϕ(C) is compact, and by Hausdorffness of ∼X(E), ϕ(C) is closed. So, ϕ maps closed sets to closed sets. This is equivalent to the preimage of a closed set being closed, according to the function ϕ−1. So, ϕ−1 is continuous too, and we have a homeomorphism. □

Alright, where to go from here? Well, things go perfectly fine up until you hit Proposition 21.5, the "conditional history exists" result. We'll need to strengthen it to arbitrary unions and intersections for things from here on out to work properly. In particular, it means that you can just intersect all the sets of coordinates C where C⊢X, to get a unique minimal set of coordinates that generates X, and bam, that's a history, but for the infinite case. Perfectly well-defined, no problems whatsoever.

Proposition 5 (Reproof of Proposition 21.5, Infinite History Remix:)IfXis a permissible subpartition, and there's a bunch of setsCi⊆Bwhere∀i:Ci⊢X, then⋂iCi⊢X, and⋃iCi⊢X.The proofs of these two results are extremely similar, just flipped around, so we'll provide the general proof framework for both cases.

Step 1 is to show it for the intersection of two sets of coordinates, or the union of two sets of coordinates. Step 2 is to show it for the intersection of a descending sequence of sets of coordinates, or the union of an ascending sequence of sets of coordinates. Step 3 is to use steps 1 and 2 and "there's only countably many coordinates" to prove the whole thing.

The proof of Step 1 perfectly follows the way it works in Scott's post, there's no meaningful differences going on, the exact same argument works.

For the proof of step 2, we'll given the argument for intersection (and for union in parentheses). Fix a sequence of sets Cn s.t. C0⊇C1⊇C2... (or, for union, have ⊆ instead).

Now, for any particular coordinate b, if b∈⋂nCn (for union, b∉⋃nCn), then it'll be in all the Cn (none of the Cn),

so for intersections we have (χ⋂nCn(s,t))b=sb=limn→∞(χCn(s,t))b

and for unions we have (χ⋃nCn(s,t))b=tb=limn→∞(χCn(s,t))b

now, if b∉⋂nCn (for union, b∈⋃nCn), then there's some finite n where Cn excludes b (includes b), and then it never returns after that (always is present after that) because the Cn get smaller (larger) as n increases. So, again, for b like that,

for intersections, we have (χ⋂nCn(s,t))b=tb=limn→∞(χCn(s,t))b

and for unions we have (χ⋃nCn(s,t))b=sb=limn→∞(χCn(s,t))b

So, since we have convergence in each individual coordinate, and there's only countably many coordinates, this means that χ⋂nCn(s,t)=limn→∞χCn(s,t) (for union, just switch the intersection to a union).

Now, since all the Cn⊢X, this means that all the χCn(s,t) will land in [s]X, which is a closed set (single points are closed in Hausdorff spaces, the preimage of a closed set through a continuous function is closed, so [s]X is closed). Thus, the limit point will also land in [s]X, establishing that ⋂nCn⊢X (or ⋃nCn⊢X)

Now for step 3. Again, the union argument is in parentheses. Given your collection of sets Ci, index the coordinates by N, there's only countably many coordinates. For coordinate n, if n∈⋂iCi (or n∉⋃iCi), then let Cn be whatever set Ci you want. If n∉⋂iCi (or n∈⋃iCi), then let Cn be some Ci which excludes (includes) the coordinate n, which must exist. Now, we can go

⋂i∈ICi=⋂n∈NCn=⋂n∈N⋂m≤nCm

(or for union) ⋃i∈ICi=⋃n∈NCn=⋃n∈N⋃m≤nCm

Basically, since our Cn were picked to exclude (include) every coordinate it's possible to exclude (include), we can rewrite our big intersection (union) as a countable intersection (union). Then just rewrite a bit.

Now, for each n, we have ⋂m≤nCm⊢X (or ⋃m≤nCm⊢X) because we proved the finite intersection (union) case and can use induction. And the sequence C0,C0∩C1,C0∩C1∩C2... (same thing but with ∪) is a descending (ascending) sequence of sets, so we can apply our proof from that case to establish that ⋂i∈ICi=⋂n∈N⋂m≤nCm⊢X (or ⋃i∈ICi=⋃n∈N⋃m≤nCm⊢X) and we're done! □

Ok, now that that's taken care of... is there anything else on the list that may be particularly difficult? Well, our next spot of mild trouble is in Proposition 23, specifically, the part about showing that the history of a supremum of partitions is the union of the histories of the component parts.

Proposition 6 (Reproof of Proposition 23.2):GivenXi, a collection of subpartitions all with the same domain, we haveh(⋁Xi)=⋃i∈Ih(Xi).So, one direction of this, that ∀j:h(⋁Xi)⊢Xj to show that the history of the supremum is as larger or larger than the union of all the other histories, is pretty easy. We have that ⋁Xi≥EXj for all j, so the history of ⋁Xi manages to generate Xj, for all j, so all the h(Xi) are a subset of the history of ⋁Xi, establishing that h(⋁Xi)⊇⋃i∈Ih(Xi).

The other subset direction, which will be established by showing ⋃i∈Ih(Xi)⊢⋁Xi, requires taking a bit more care. Our first order of business is showing that ⋁Xi≤E⋃i∈Ih(Xi)|E. Assume that two points, s and t, both in E, fulfill s∼⋃i∈Ih(Xi)t. Then, we have ∀i∈I:s∼h(Xi)t. And since h(Xi)⊢Xi for all i, and s,t are both in E, this implies ∀i∈I:s∼Xit. Which is equivalent to s∼⋁Xit. And so, we have established that inequality.

But that isn't enough, the last piece we need to conclude the argument is that χ⋃i∈Ih(Xi)(E,E)=E. We'll do this by showing that for arbitrary s,t∈E, that χ⋃i∈Ih(Xi)(s,t)∈E. First off, we can go

χ⋃i∈Ih(Xi)(s,t)=χ⋃n⋃m≤nh(Xm)(s,t)=limn→∞χ⋃m≤nh(Xm)(s,t)

The first inequality was the same sort of ascending/descending countable chain argument used to make a sequence of ever-larger sets in Proposition 6. Then, we just use the usual limit argument with that of "each coordinate individually converges, so we have overall convergence". Now, if we knew that each χ⋃m≤nh(Xm)(s,t) was in E, then by closure of E, we'd have that our desired point lands in E. So, we just have to show these finite stages land in E. We do this with an induction proof. Clearly, χh(X0)(s,t)∈E, because h(X0)⊢X0, so χh(X0)(E,E)=E. For the induction step, we go

χ⋃m≤n+1h(Xm)(s,t)=χh(Xn+1)(s,χ⋃m≤nh(Xm)(s,t))∈E

Where that last inclusion is because, by induction assumption, χ⋃m≤nh(Xm)(s,t)∈E, and also h(Xn+1)⊢Xn+1.

And so, we have that χ⋃i∈Ih(Xi)(E,E)=E, and so we've established that ⋃i∈Ih(Xi)⊢⋁Xi, establishing the other subset inclusion direction, and thus equality. □

Alright, where is this heading? Well, the semigraphoid axioms, of course. The only roadblock left is that we have to reprove Lemma 2 from scratch. The old proof no longer suffices due to using impermissible partitions, and the old proof path cannot be repaired.

Proposition 7 (reproof of Lemma 2):LetX,Ybe subpartitions, andEbe their domain. Thenh(X∨Y)=h(X)∪⋃x∈Xh(Y|x).Proof. Since X≤EX∨EY, we have h(X)⊆h(X∨Y). Symmetrically, for all x∈X, since Y|x⊆X∨Y, we have h(Y|x)⊆h(X∨Y) by Proposition 23 in Scott's paper. Thus, h(X∨Y)⊇h(X)∪⋃x∈Xh(Y|x). One direction down, one to go.

In order to begin attacking the reverse direction, our first order of business is taking a detour to establish that h(X)∪⋃x∈Xh(Y|x)⊢IndE, the indiscrete partition of E. In order to do this, we'll show that for any particular x∈X, h(X)∪h(Y|x)⊢IndE. Then, just yeet Proposition 5 (from this post) at that to show our desired result, that you can union everything together and it'll still generate IndE.

So, let x be arbitrary in X, and r be an arbitrary point in x, and s,t be arbitrary points in E.

χh(X)∪h(Y|x)(s,t)=χh(X)(s,χh(Y|x)(s,t))=χh(X)(s,χh(X)(r,χh(Y|x)(s,t)))

=χh(X)(s,χh(Y|x)(χh(X)(r,s),χh(X)(r,t)))

And both χh(X)(r,s) and χh(X)(r,t) land in [r]X=x, because h(X)⊢X and r,s,t are all in E.

Accordingly, χh(Y|x)(χh(X)(r,s),χh(X)(r,t))∈[χh(X)(r,s)]Y⊆E, because h(Y|x)⊢Y|x, and both of the chi thingies land in x. And then, since both s and the long χh(Y|x) term land in E, χh(X)(s,stuff) lands in E. And so, since s and t were arbitrary in E, we have established that χh(X)∪h(Y|x)(E,E)=E, which is sufficient by itself to establish that h(X)∪h(Y|x)⊢IndE for our arbitrary x. And then, by Proposition 5, h(X)∪⋃x∈Xh(Y|x)⊢IndE.

At this point, we can back out of the lemma and resume our work of showing that this batch of histories is capable of generating X∨EY.

We'll be using the definition of generation where we try to show that if s and t lie in the appropriate set (E in this case), then chimera-ing them together lands in the same equivalence class as s. We have

χh(X)∪⋃x∈Xh(Y|x)(s,t)=χh(Y|[s]X)(s,χh(X)(s,χh(X)∪⋃x∈Xh(Y|x)(s,t)))

Now, since we had previously derived that h(X)∪⋃x∈Xh(Y|x)⊢IndE, that means that χh(X)∪⋃x∈Xh(Y|x)(s,t)∈E. Moving out one layer, χh(X)(s,stuff) has both components landing in E, and h(X)⊢X, so that result will land in [s]X.

Moving out another layer from that, χh(Y,[s]X)(s,stuff) has both components landing in [s]X, and h(Y|[s]X)⊢Y|[s]X, so the result will land in [s]Y|[s]X, aka [s]Y∩[s]X, aka [s]X∨Y. And we're done! Our collection of histories generates X∨Y. □

But of course, all of this was just a warmup for showing the semigraphoid axioms, which... go suprisingly smoothly. Most of the aggrivation was concentrated in reproving Lemma 2.

Countably Factored Spaces: Polynomials and ProbabilityAnd then things go poorly right around here. Or at least, there's a big difficulty spike if you're trying to rescue all results. Apparently, for Daniel's "countable sets" case, since there can only be finitely many coordinates, you only have to deal with power series (generalization of polynomials) involving sums of countably many terms of finitely many variables each, which is a little tricky, but doable, and TurnTrout did it.

But here, when I tried, I wound up dealing with sums of uncountably many terms, consisting of countably many variables each. Which, apparently, don't work well at all. Now, an uncountable sum over infinitesimally small things is kinda like an integral, or taking the union of a bunch of points to make a set, and as it turns out, a whole bunch of the results in this section have analogues if you swap out the polynomial of a set for the set itself.

With this reframing, Proposition 26 becomes trivial, and Proposition 27 turns into the statement that chimera-ing sets together can be viewed as projecting them down to the appropriate coordinates, and then taking the product of those.

Let's use SC as an abbreviation for ∏b∈CS/∼b, basically, the space you get when you project down to the factors in C. Put another way, Proposition 27 is basically saying that, when C0 and C1 are disjoint sets of coordinates, then

prSC0∪C1(χC0(E0,E1))=prSC0(E0)×prSC1(E1)

This can be trivially verified from understanding what the chimera function does. prSC0(E0) are the possible coordinate values for the coordinates in C0 which are available for use by the chimera function, and prB/SC0(E1) are the possible coordinate values for the coordinates in B/SC0 which are available for use by the chimera function, and the chimera function can put these together however it wishes, making the set prSC0(E0)×prB/SC0(E1). Project out some extra coordinates, and you get the result.

Proposition 28 turns into the statement that, given a set E⊆S that factorizes as F×G, you can just let C be the set of relevant coordinates for the set F, and write F and G as prSC(E) and prSB/C(E).

Propositions 29 and 30 turn into the statement that given some set E, there's a way to split it into as many factors as possible, until eventually you hit a batch of "irreducible pieces", sets of coordinates C where prC(E) just can't be factorized any further, it's an odd shape, and that these sets of coordinates partition B, and multiplying these things together makes E again. The rough proof path for this is showing that given any way of factorizing E in two different ways using coordinates (C0,B/C0) in one way, and (C1,B/C1) in another way, then (C0∩C1,B/(C0∩C1)) is also a factorization of the set. Then we do our usual sort of argument with the countable descending chains to get that for any particular coordinate b, there's a minimal set of coordinates including b that don't let you factor the set any more.

It's slightly tricky to show, but Lemma 3 carries over and shows an equivalence between X⊥Y|Z, and the statement "for all x,y, and z, if you decompose x∩z and y∩z into irreducible pieces (be very sure to label all pieces with the coordinates they're pinning down, to make the pieces distinguishable), and throw them into a multiset, it'll be the same as decomposing z and x∩y∩z into irreducible pieces and throwing those pieces into a multiset"

But, sadly, when we go to the Fundamental Theorem of Finite Factored Sets, it's just too hard (for me personally, you may be different) to show that probabilistic independence implies two multisets of irreducible pieces (labeled with coordinates) are identical.

But, it

ispretty easy to get the other direction, where conditional orthogonality implies conditional independence. Well, kinda. I'd ideally like to strengthen it to talk about conditional probabilities (if it's very improbable to select any particular event z), but I only proved the unconditional probabilities version, though a slightly stronger version than was originally stated.Proposition 7 (reproof of one direction of the fundamental theorem):Given any factorized probability distributionμ, permissible partitionsX,Y,Zs.t.X⊥Y|Z, subsetsX′⊆XandY′⊆Y, andz∈Z, thenPμ(⋃x∈X′x∩z)⋅Pμ(⋃y∈Y′y∩z)=Pμ(⋃x∈X′x∩⋃y∈Y′y∩z)⋅Pμ(z)

Proof: Since X⊥Y|Z, this means that for all z∈Z, we have h(X|z)∩h(Y|z)=∅. Use C to abbreviate h(X|z). Now, clearly, by the definition of ⊢, we have χC(z,z)=z, so we also have χB/C(z,z)=z. Also, h(Y|z)⊆B/C, so Y|z≤zh(Y|z)|z≤z⋁(B/C)|z and so this means that B/C⊢Y|z.

Since C⊢X|z (remember, it's h(X|z)), we have that for all x∈X, χC(x∩z,z)=x∩z. Rephrasing this somewhat, it's saying prSC(x∩z)×prSB/C(z)=x∩z for all x. Accordingly, we have

⋃x∈X′x∩z=⋃x∈X′(prSC(x∩z)×prSB/C(z))

=(⋃x∈X′(prSC(x∩z))×prSB/C(z)=prSC(⋃x∈X′x∩z)×prSB/C(z)

Also, since B/C⊢Y|z, we can run through similar arguments to get that

⋃y∈Y′y∩z=prSB/C(⋃y∈Y′y∩z)×prSC(z)

And, since we have

χC(⋃x∈X′x∩z,⋃y∈X′y∩z)⊆χC(⋃x∈X′x∩z,z)=⋃x∈X′x∩z

and

χC(⋃x∈X′x∩z,⋃y∈X′y∩z)⊆χC(z,⋃y∈X′y∩z)=χB/C(⋃y∈X′y∩z,z)=⋃y∈Y′y∩z

This means that we have

χC(⋃x∈X′x∩z,⋃y∈Y′y∩z)⊆⋃x∈X′x∩⋃y∈Y′y∩z

But, hang on, those sets we're chimera-ing together are projections of sets as large or larger than the intersection of X′,Y′, and z, so we must have equality there.

Accordingly, we can use that factorization and get

⋃x∈X′x∩⋃y∈Y′y∩z=prSC(⋃x∈X′x∩z)×prSB/C(⋃y∈Y′y∩z)

and also, since we had χC(z,z)=z, this means that z=prSC(z)×prSB/C(z)

And so, throwing probabilities in the mix, we have

Pμ(⋃x∈X′x∩z)⋅Pμ(⋃y∈Y′y∩z)

=PμC(prSC(⋃x∈X′x∩z))⋅PμB/C(prSB/C(z))+PμB/C(prSB/C(⋃y∈Y′y∩z))⋅PμC(prSC(z))

=Pμ(⋃x∈X′x∩⋃y∈Y′y∩z)⋅Pμ(z)

And we're done! □

Best of luck on trying to show the reverse direction!