Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Lemma 3:If F is a continuous function of type X→K(Y), where K(Y) is the space of nonempty compact subsets of the space Y, then given any compact set CX⊆X, ⋃x∈CXF(x) will be compact in Y.

Fix some compact set CX⊆X, and continuous function F:X→K(Y). We will operate by taking an arbitrary open cover of ⋃x∈CXF(x) and finding a finite subcover.

Let {Oi}i∈I be an open cover of ⋃x∈CXF(x). The Oi are subsets of Y. The topology compatible with Hausdorff distance on K(Y) (space of compact subsets of Y) is the Vietoris topology, where the basis opens are given by finite collections of open sets in Y. You take the set of all compact subsets of Y which are subsets of the union of your finite collection of open sets, and intersect every open set in your finite collection

Accordingly, let J=Pfin(I) (the set of all finite subsets of I, the index set for our open cover), and fix a collection of open sets in K(Y), {Oj}j∈J. The sets Oj are defined as:

Oj:={CY∈K(Y)|CY⊆⋃i∈jOi∧∀i∈j:CY∩Oi≠∅}

Now, all the F(x) with x∈CX are compact (F produces compact sets as output), and they are all subsets of ⋃x∈CXF(x), so {Oi}i∈I is a cover of F(x), and due to its compactness we can identify a finite subcover, and prune away every open st which doesn't intersect F(x). F(x) is a subset of the union of those finitely many open sets, and intersects all of them, so the point F(x)∈K(Y) lies in the open set Oj induced by that finite cover of open sets.

This argument works for arbitrary F(x) with x∈CX, so the collection {Oj}j∈J is an open cover of F(CX). Also, because F is continuous and CX is compact, F(CX) is compact, so we can identify a finite subcover from {Oj}j∈J.

Then, consider the collection of open sets Oi where i∈j for some Oj which is part of the finite cover of F(CX). This is finitely many opens, we're unioning together finitely many (finitely many Oj selected) finite sets of open sets (each Oj is associated with finitely many Oi that it was built from).

Now we just have to show that this collection covers ⋃x∈CXF(x), and we'll have made our finite subcover and shown that said set is compact. Assume our finite collection of opens doesn't cover the set. Then there's some F(x) which wasn't covered completely. However, the point corresponding to F(x) in K(Y) lies in some Oj, and from its definition, the corresponding Oi manage to cover F(x), and we have a contradiction. We're done.

Proposition 19:h⋉K is an infradistribution, and preserves all properties indicated in the diagram at the start of this section if h and all the K(x) have said property.

To show this, we'll verify that it's well-defined at all, normalization, monotonicity, concavity, Lipschitzness, compact almost-support, and preservation of the properties.

(h⋉K)(f):=h(λx.K(x)(λy.f(x,y)))

Our first order of business is verifying that

λx.K(x)(λy.f(x,y))

is even a continuous function to be able to show that h can accept it as input.

For continuity, let xn limit to x, and we'll try to show that K(xn)(λy.f(xn,y)) limits to K(x)(λy.f(x,y)). Let λ⊙K be the Lipschitz constant upper bound of K.

First, note that {xn}n∈N∪{x} is a compact set because xn limits to x. Thus, by the compact-shared compact almost-support condition on an infrakernel, there must be some compact set Cϵ⊆Y where all the K(xn) agree that functions f,f′ agreeing on Cϵ have values only ϵd(f,f′) apart from each other.

Now, because f is a continuous bounded function X×Y→R, it's uniformly continuous when restricted to

({xn}n∈N∪{x})×Cϵ

as this is the product of two compact sets and is compact. Due to the uniform continuity of f restricted to that set, there is some number δ where points only δ apart in that set have their values only differing by ϵ. Further, there is some number m0 where, for all m≥m0, d(xm,x)<δ.

Additionally, the maximum difference between λy.f(x,y) and λy.f(x′,y) is 2||f||.

Now that we know our number m0 we can pick an arbitrary m above it, and go:

∀m≥m0∀y∈Cϵ:d((xm,y),(x,y))=d(xm,x)≤δ

∀m≥m0∀y∈Cϵ:|f(xm,y)−f(x,y)|≤ϵ

∀m≥m0:d((λy.f(xm,y))↓Cϵ,(λy.f(x,y))↓Cϵ)≤δ

And now, because these two functions restricted to Cϵ are only ϵ apart, we can apply Lemma 2 to conclude that (since Cϵ and λ⊙K work for all the K(xn))

And for each \eps we can construct a m_0 in this way, concluding that

limn→∞|K(xn)(λy.f(xx,y))−K(xn)(λy.f(x,y))|=0

Also, from our pointwise convergence condition on infradistributions,

limn→∞K(xn)(λy.f(x,y))=K(x)(λy.f(x,y))

Therefore,

limn→∞K(xn)(λy.f(xn,y))=K(x)(λy.f(x,y))

and so, we now know that

λx.K(x)(λy.f(x,y))

is a continuous function X→R. For boundedness, upper and lower bounds on λy.f(x,y) are ||f|| (and the negative version of it). Due to the shared Lipschitz constant on the K(x), an upper and lower-bound on λx.K(x)(λy.f(x,y)) is λ⊙K||f|| (and the negative version.) Thus, we can safely feed said function into the infradistribution h, so the semidirect product is well-defined. We must still show that it makes an infradistribution.

In order, this was the definition of the semidirect product, all the K(x) being concave so splitting them up produces a lower value (and then monotonicity for h), then h being concave.

This leaves Lipschitzness and CAS. For Lipschitzness, given some f and f′, and letting λ⊙h be the Lipschitz constant of h, we have:

Thus, that final thing shows that there's a finite Lipschitz constant for h⋉K.

This leaves compact almost-support. Pick any ϵ. This induces a compact set CXϵ which is an ϵ-almost-support for h, and then this compact set induces a compact set CYϵ which an ϵ-almost-support for all the K(x) where x∈CXϵ. Now, we can apply Lemma 2 to go:

Pretty much, that first part is the "CXϵ is an ϵ-almost-support for h" piece, and the second piece is the "hey, these two functions may be a bit different on said compact set, we've gotta multiply that by the Lipschitz constant" piece. So, let's work on unpacking these two distances. For the first one, we can go:

And, because f and f′ agree on CXϵ×CYϵ, we have λy.f(x,y) and λy.f′(x,y) agreeing on CYϵ, which is an ϵ-almost-support for all the K(x) where x∈CXϵ, so we have:

≤supx∈CXϵϵd(λy.f(x,y),λy.f′(x,y))

=ϵsupx∈CXϵsupy|f(x,y)−f′(x,y)|

≤ϵsupx,y|f(x,y)−f′(x,y)|=ϵd(f,f′)

Substituting this back in produces:

≤ϵλ⊙Kd(f,f′)+ϵλ⊙hd(f,f′)

And regrouping this and recapping means that we have:

|(h⋉K)(f)−(h⋉K)(f′)|≤ϵ(λ⊙K+λ⊙h)d(f,f′)

So we have crafted a compact ϵ(λ⊙K+λ⊙h)-support for h⋉K, and we can make ϵ arbitrarily small, so the semidirect product has compact almost-support, which is the last condition we needed.

1-Lipschitz: We showed in the Lipschitz section that an upper bound on the Lipschitz constant of h⋉K is the product of the Lipschitz constants of the kernel and the original infradistribution, so 1⋅1=1 and 1-Lipschitzness is preserved.

Our task now is to show that ⋃x∈Ch({x}×CK(x)) is compact, which will take a fair amount of topology work. Our first piece that we'll need is that if xn limits to x, then CK(xn) limits to CK(x) in Hausdorff-distance.

To show this, we'll split it into two parts. First, we'll assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x) and disprove that. Second, we'll assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn), and disprove that.

For the first part, assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). Craft the continuous function

f1:=λy.sup(1−1ϵinfy′∈CK(x)d(y,y′),0)

What this does is it's 1 on the set CK(x), and 0 on anything more than ϵ away from it. One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), so:

limn→∞infy∈CK(xn)f1(y)=infy∈CK(x)f1(y)

The latter term is 1 because f1 is 1 over CK(x). However, because we're assuming that infinitely often, there's a point in CK(xn) that is ϵ away from CK(x), the sequence on the left-hand side is infinitely often 0, so it doesn't converge and we have a contradiction.

For the second part, assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). By compactness of CK(x), we can find finitely many points yi in it s.t. every point in CK(x) is only ϵ2 away from one of the yi (cover CK(x) with ϵ2-size open balls centered on points in it and take a finite subcover). Now, for each of these, we can craft a function

fi:=λy.inf(1,2ϵd(y,yi))

So, this is 0 at the point yi, and 1 at any distance ϵ2 or more away from it.

One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), and there are finitely many fi, so there's some time where all of them nearly converge, ie:

limn→∞supi|K(xn)(fi)−K(x)(fi)|=0

However, infinitely often there's a point yn∈CK(x) that is ϵ away from CK(xn). yn is ϵ2 away from some yi, so that yi can't be closer than ϵ2 to CK(xn). (if it was closer, then we could pick some point in CK(xn) that's closer than ϵ2 to yi, and then since it's only ϵ2 away from yn, we'd have that the distance from yn to CK(xn) is below ϵ2, an impossibility).

Because the distance from yi to any point in CK(xn) is above ϵ2, then

This is because yi∈CK(x) and attains a value of 0 according to fi, while CK(xn) stays away from yi and all its points must have a value of 1. This situation happens infinitely often, which leads to a contradiction with

limn→∞supi|K(xn)(fi)−K(x)(fi)|=0

Because infinitely often, one of these fi has very different values, so the sequence is 1 infinitely often and can't limit to 0.

So, we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). And we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). Fixing any ϵ, in the tail of the sequence, CK(x) and CK(xn) are ϵ distance or closer in Hausdorff distance because you can't find points in either set which are far away from the other set. So, CK(xn) limits to CK(x) in Hausdorff-distance when xn limits to x, and we know that x↦CK(x) is a continuous function X→K(Y).

This lets us show that the set

⋃x∈Ch({x}×CK(x))

is closed, because if xn limits to x and yn∈CK(xn) and yn limits to y, we have that y∈CK(x) because CK(xn) limits to CK(x) in Hausdorff distance, so we've got closed graph.

Also, by invoking Lemma 3, we know that

⋃x∈ChCK(x)

is compact.

Time to wrap this all up. We know that ⋃x∈Ch{x}×CK(x) is closed in X×Y from our Hausdorff limit argument. This set is also a subset of:

Ch×⋃x∈ChCK(x)

Which is a product of two sets known to be compact, and is compact. It's a closed subset of a compact set, so it's compact. Therefore,

⋃x∈Ch{x}×CK(x)

is a compact set, and from way back,

(h⋉K)(f)=inf(x,y)∈⋃x∈Ch({x}×CK(x))f(x,y)

And we've shown that set is compact, so h⋉K where h and all the K(x) are sharp can be written as minimizing over a compact set, so h⋉K is sharp. Thus, semidirect product preserves all the nice properties, and we're finally done with this proof.

Proposition 20:If all the K(x) are C-additive, then prX∗(h⋉K)=h.

This is because, since f(x) doesn't depend on y, it acts as a constant inside K(x) and C-additivity lets us pull it out.

Proposition 21:If K0,K1,K2... are a sequence of infrakernels of type Kn:∏i=ni=0Xiik→Xn+1, and h is an infradistribution over X0, then (...((h⋉K0)⋉K1)...⋉Km) can be rewritten as h⋉K:m where K:n is an infrakernel of type X0ik→∏i=n+1i=1Xi, recursively defined as K:0:=K0 and K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))

So, for our inductive definition,

K:0(x0):=K0(x0)

K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)

Our task is to show that these are all infrakernels, by induction, and that for any infradistribution h,

(...((h⋉K0)⋉K1)...⋉Kn)=h⋉K:n

For the base case, we observe that K:0 is an infrakernel because it equals K0, which is an infrakernel, and that h⋉K0=h⋉K:0

Time for the induction step. We'll assume that K:n is an infrakernel, and show that K:n+1 is. Further, we need to show that h⋉K:n+1=(h⋉K:n)⋉Kn+1. This will show the result.

Our first requirement is showing that for all x0, K:n+1(x0) is an infradistribution.

K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))

By our induction assumption, K:n(x0) is an infradistribution as K:n is an infrakernel. Further, λx1:n+1.Kn+1(x0,x1:n+1) is an infrakernel because Kn+1 is and we're just restricting it to a subset of its domain, so it keeps being an infrakernel. And we know from earlier that the semidirect product of an infradistribution and an infrakernel is an infradistribution. So that's taken care of.

Now, we must show a common Lipschitz constant, pointwise function convergence, and compact-shared compact almost-support for K:n+1 to certify that it's an infrakernel.

Starting with common Lipschitz constant, we can just note that, in our proof of Proposition 19, we saw that the Lipschitz constant of the semidirect product was upper-bounded by the product of the Lipschitz constants of the starting infradistributions and the kernel. Assuming that K:n is an infradistribution, we have that the Lipschitz constant of any K:n(x0) is upper-bounded by some λ⊙:n Lipschitz constant. Also, the Lipschitz constant of Kn+1(x0,x1:n+1) is upper-bounded by some λ⊙n+1 Lipschitz constant. Thus, λ⊙:nλ⊙n+1 is an upper-bound on the Lipschitz constant of any

K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))

infradistribution, which is exactly K:n+1(x0), witnessing that K:n+1 has a uniform upper bound on its Lipschitz constants.

Time to move onto the second one, compact-shared compact almost-support.

This is the sentence that says that K:n+1 has compact-shared compact almost-support. f and f′ have type signature ∏i=n+2i=1Xi→R.

Now, this is going to be quite complicated, so pay close attention. Fix an arbitrary compact CX0⊆X0, and an arbitrary ϵ. Let λ⊙:n be the Lipschitz constant for the infrakernel K:n, and λ⊙n+1 be the Lipschitz constant for the infrakernel Kn+1.

Due to compact-shared compact-almost-support for K:n which exists by our induction assumption, your set CX0 induces a compact ϵ2λ⊙n+1-almost-support for the family of infradistributions K:n(x0) where x0∈CX0. Call said almost-support C∏i=n+1i=1Xiϵ2λ⊙n+1.

Further, due to compact-shared compact-almost-support for Kn+1 , the set

CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1

induces a compact ϵ2λ⊙:n-almost-support for the family of infradistributions Kn+1(x0,x1:n+1) where (x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1

Call said almost-support CXn+2ϵ2λ⊙:n

And now let your shared ϵ-almost-support for K:n+1(x0) where x0∈CX0 be:

C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n

We must show that said set is indeed a shared ϵ-almost-support for K:n+1(x0) where x0∈CX0. So, let f and f′ agree on said set. Then, we have:

This is just unpacking the definition of the iterated semidirect product, no issues here. Now, we use Lemma 2 and the fact that C∏i=n+1i=1Xiϵ2λ⊙n+1 is a ϵ2λ⊙n+1-almost-support for K:n(x0) when x0∈CX0, to get:

Lemma 3:IfFis a continuous function of typeX→K(Y), whereK(Y)is the space of nonempty compact subsets of the spaceY, then given any compact setCX⊆X,⋃x∈CXF(x)will be compact inY.Fix some compact set CX⊆X, and continuous function F:X→K(Y). We will operate by taking an arbitrary open cover of ⋃x∈CXF(x) and finding a finite subcover.

Let {Oi}i∈I be an open cover of ⋃x∈CXF(x). The Oi are subsets of Y. The topology compatible with Hausdorff distance on K(Y) (space of compact subsets of Y) is the Vietoris topology, where the basis opens are given by finite collections of open sets in Y. You take the set of all compact subsets of Y which are subsets of the union of your finite collection of open sets, and intersect every open set in your finite collection

Accordingly, let J=Pfin(I) (the set of all finite subsets of I, the index set for our open cover), and fix a collection of open sets in K(Y), {Oj}j∈J. The sets Oj are defined as:

Oj:={CY∈K(Y)|CY⊆⋃i∈jOi∧∀i∈j:CY∩Oi≠∅}

Now, all the F(x) with x∈CX are compact (F produces compact sets as output), and they are all subsets of ⋃x∈CXF(x), so {Oi}i∈I is a cover of F(x), and due to its compactness we can identify a finite subcover, and prune away every open st which doesn't intersect F(x). F(x) is a subset of the union of those finitely many open sets, and intersects all of them, so the point F(x)∈K(Y) lies in the open set Oj induced by that finite cover of open sets.

This argument works for arbitrary F(x) with x∈CX, so the collection {Oj}j∈J is an open cover of F(CX). Also, because F is continuous and CX is compact, F(CX) is compact, so we can identify a finite subcover from {Oj}j∈J.

Then, consider the collection of open sets Oi where i∈j for some Oj which is part of the finite cover of F(CX). This is finitely many opens, we're unioning together finitely many (finitely many Oj selected) finite sets of open sets (each Oj is associated with finitely many Oi that it was built from).

Now we just have to show that this collection covers ⋃x∈CXF(x), and we'll have made our finite subcover and shown that said set is compact. Assume our finite collection of opens doesn't cover the set. Then there's some F(x) which wasn't covered completely. However, the point corresponding to F(x) in K(Y) lies in some Oj, and from its definition, the corresponding Oi manage to cover F(x), and we have a contradiction. We're done.

Proposition 19:h⋉Kis an infradistribution, and preserves all properties indicated in the diagram at the start of this section ifhand all theK(x)have said property.To show this, we'll verify that it's well-defined at all, normalization, monotonicity, concavity, Lipschitzness, compact almost-support, and preservation of the properties.

(h⋉K)(f):=h(λx.K(x)(λy.f(x,y)))

Our first order of business is verifying that

λx.K(x)(λy.f(x,y))

is even a continuous function to be able to show that h can accept it as input.

For continuity, let xn limit to x, and we'll try to show that K(xn)(λy.f(xn,y)) limits to K(x)(λy.f(x,y)). Let λ⊙K be the Lipschitz constant upper bound of K.

Pick an ϵ, we'll show that there's some m0 where

∀n∀m≥m0:|K(xn)(λy.f(xm,y))−K(xn)(λy.f(x,y))|≤ϵ(2||f||+λ⊙K)

First, note that {xn}n∈N∪{x} is a compact set because xn limits to x. Thus, by the compact-shared compact almost-support condition on an infrakernel, there must be some compact set Cϵ⊆Y where all the K(xn) agree that functions f,f′ agreeing on Cϵ have values only ϵd(f,f′) apart from each other.

Now, because f is a continuous bounded function X×Y→R, it's uniformly continuous when restricted to

({xn}n∈N∪{x})×Cϵ

as this is the product of two compact sets and is compact. Due to the uniform continuity of f restricted to that set, there is some number δ where points only δ apart in that set have their values only differing by ϵ. Further, there is some number m0 where, for all m≥m0, d(xm,x)<δ.

Additionally, the maximum difference between λy.f(x,y) and λy.f(x′,y) is 2||f||.

Now that we know our number m0 we can pick an arbitrary m above it, and go:

∀m≥m0∀y∈Cϵ:d((xm,y),(x,y))=d(xm,x)≤δ

∀m≥m0∀y∈Cϵ:|f(xm,y)−f(x,y)|≤ϵ

∀m≥m0:d((λy.f(xm,y))↓Cϵ,(λy.f(x,y))↓Cϵ)≤δ

And now, because these two functions restricted to Cϵ are only ϵ apart, we can apply Lemma 2 to conclude that (since Cϵ and λ⊙K work for all the K(xn))

∀n:|K(xn)(λy.f(xm,y))−K(xn)(λy.f(x,y))|≤ϵ⋅2||f||+ϵλ⊙K

This argument works for any m≥m0, so we have:

∃m0∀n∀m≥m0:|K(xn)(λy.f(xm,y))−K(xn)(λy.f(x,y))|≤ϵ(2||f||+λ⊙K)

Letting n=m in particular,

∃m0∀n≥m0:|K(xn)(λy.f(xn,y))−K(xn)(λy.f(x,y))|≤ϵ(2||f||+λ⊙K)

And for each \eps we can construct a m_0 in this way, concluding that

limn→∞|K(xn)(λy.f(xx,y))−K(xn)(λy.f(x,y))|=0

Also, from our pointwise convergence condition on infradistributions,

limn→∞K(xn)(λy.f(x,y))=K(x)(λy.f(x,y))

Therefore,

limn→∞K(xn)(λy.f(xn,y))=K(x)(λy.f(x,y))

and so, we now know that

λx.K(x)(λy.f(x,y))

is a continuous function X→R. For boundedness, upper and lower bounds on λy.f(x,y) are ||f|| (and the negative version of it). Due to the shared Lipschitz constant on the K(x), an upper and lower-bound on λx.K(x)(λy.f(x,y)) is λ⊙K||f|| (and the negative version.) Thus, we can safely feed said function into the infradistribution h, so the semidirect product is well-defined. We must still show that it makes an infradistribution.

For normalization,

(h⋉K)(1)=h(λx.K(x)(λy.1))=h(λx.1)=1

(h⋉K)(0)=h(λx.K(x)(λy.0))=h(λx.0)=0

For monotonicity, if f′≥f,

∀x:λy.f′(x,y)≥λy.f(x,y)

∀x:K(x)(λy.f′(x,y))≥K(x)(λy.f(x,y))

λx.K(x)(λy.f′(x,y))≥λx.K(x)(λy.f(x,y))

(h⋉K)(f′)=h(λx.K(x)(λy.f′(x,y)))≥h(λx.K(x)(λy.f′(x,y)))=(h⋉K)(f)

For concavity,

(h⋉K)(pf+(1−p)f′)=h(λx.K(x)(λy.pf(x,y)+(1−p)f′(x,y)))

≥h(λx.pK(x)(λy.f(x,y))+(1−p)K(x)(λy.f′(x,y)))

≥ph(λx.K(x)(λy.f(x,y)))+(1−p)h(λx.K(x)(λy.f′(x,y)))

=p(h⋉K)(f)+(1−p)(h⋉K)(f′)

In order, this was the definition of the semidirect product, all the K(x) being concave so splitting them up produces a lower value (and then monotonicity for h), then h being concave.

This leaves Lipschitzness and CAS. For Lipschitzness, given some f and f′, and letting λ⊙h be the Lipschitz constant of h, we have:

|(h⋉K)(f)−(h⋉K)(f′)|=|h(λx.K(x)(λy.f(x,y)))−h(λx.K(x)(λy.f′(x,y)))|

≤λ⊙hd(λx.K(x)(λy.f(x,y)),λx.K(x)(λy.f′(x,y)))

=λ⊙hsupx|K(x)(λy.f(x,y))−K(x)(λy.f′(x,y))|

≤λ⊙hsupxλ⊙Kd(λy.f(x,y),λy.f′(x,y))=λ⊙hλ⊙Ksupxd(λy.f(x,y),λy.f′(x,y))

≤λ⊙hλ⊙Kd(f,f′)

Thus, that final thing shows that there's a finite Lipschitz constant for h⋉K.

This leaves compact almost-support. Pick any ϵ. This induces a compact set CXϵ which is an ϵ-almost-support for h, and then this compact set induces a compact set CYϵ which an ϵ-almost-support for all the K(x) where x∈CXϵ. Now, we can apply Lemma 2 to go:

|h(λx.K(x)(λy.f(x,y)))−h(λx.K(x)(λy.f′(x,y)))|

≤ϵd(λx.K(x)(λy.f(x,y)),λx.K(x)(λy.f′(x,y)))

+λ⊙hd((λx.K(x)(λy.f(x,y)))↓CXϵ,(λx.K(x)(λy.f′(x,y)))↓CXϵ)

Pretty much, that first part is the "CXϵ is an ϵ-almost-support for h" piece, and the second piece is the "hey, these two functions may be a bit different on said compact set, we've gotta multiply that by the Lipschitz constant" piece. So, let's work on unpacking these two distances. For the first one, we can go:

d(λx.K(x)(λy.f(x,y)),λx.K(x)(λy.f′(x,y)))

=supx|K(x)(λy.f(x,y))−K(x)(λy.f′(x,y))|

≤supxλ⊙Kd(λy.f(x,y),λy.f′(x,y))

=λ⊙Ksupxd(λy.f(x,y),λy.f′(x,y))

=λ⊙Ksupxsupy|f(x,y)−f′(x,y)|

=λ⊙Ksupx,y|f(x,y)−f′(x,y)|=λ⊙Kd(f,f′)

Substituting this back in produces:

≤ϵλ⊙Kd(f,f′)+λ⊙hd((λx.K(x)(λy.f(x,y)))↓CXϵ,(λx.K(x)(λy.f′(x,y)))↓CXϵ)

Time to go after the second distance piece. We have:

d((λx.K(x)(λy.f(x,y)))↓CXϵ,(λx.K(x)(λy.f′(x,y)))↓CXϵ)

=supx∈CXϵ|K(x)(λy.f(x,y))−K(x)(λy.f′(x,y))|

And, because f and f′ agree on CXϵ×CYϵ, we have λy.f(x,y) and λy.f′(x,y) agreeing on CYϵ, which is an ϵ-almost-support for all the K(x) where x∈CXϵ, so we have:

≤supx∈CXϵϵd(λy.f(x,y),λy.f′(x,y))

=ϵsupx∈CXϵsupy|f(x,y)−f′(x,y)|

≤ϵsupx,y|f(x,y)−f′(x,y)|=ϵd(f,f′)

Substituting this back in produces:

≤ϵλ⊙Kd(f,f′)+ϵλ⊙hd(f,f′)

And regrouping this and recapping means that we have:

|(h⋉K)(f)−(h⋉K)(f′)|≤ϵ(λ⊙K+λ⊙h)d(f,f′)

So we have crafted a compact ϵ(λ⊙K+λ⊙h)-support for h⋉K, and we can make ϵ arbitrarily small, so the semidirect product has compact almost-support, which is the last condition we needed.

Time for property verification.

Homogenity:

(h⋉K)(af)=h(λx.K(x)(λy.af(x,y)))=h(λx.aK(x)(λy.f(x,y)))

=ah(λx.K(x)(λy.f(x,y)))=a(h⋉K)(f)

1-Lipschitz: We showed in the Lipschitz section that an upper bound on the Lipschitz constant of h⋉K is the product of the Lipschitz constants of the kernel and the original infradistribution, so 1⋅1=1 and 1-Lipschitzness is preserved.

Cohomogenity:

(h⋉K)(1+af)=h(λx.K(x)(λy.1+af(x,y)))=h(λx.1−a+aK(x)(λy.1+f(x,y)))

=h(λx.1+a(−1+K(x)(λy.1+f(x,y))))=1−a+ah(λx.1−1+K(x)(λy.1+f(x,y))))

=1−a+ah(λx.K(x)(λy.1+f(x,y))))=1−a+a(h⋉K)(1+f)

C-additivity:

(h⋉K)(c)=h(λx.K(x)(λy.c))=h(λx.c)=c

Crispness: Both homogenity and C-additivity are preserved, so crispness is too.

Sharpness:

(h⋉K)(f)=h(λx.K(x)(λy.f(x,y)))=h(λx.infy∈CK(x)f(x,y))

=infx∈Ch(infy∈CK(x)f(x,y))=inf(x,y)∈⋃x∈Ch({x}×CK(x))f(x,y)

Our task now is to show that ⋃x∈Ch({x}×CK(x)) is compact, which will take a fair amount of topology work. Our first piece that we'll need is that if xn limits to x, then CK(xn) limits to CK(x) in Hausdorff-distance.

To show this, we'll split it into two parts. First, we'll assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x) and disprove that. Second, we'll assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn), and disprove that.

For the first part, assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). Craft the continuous function

f1:=λy.sup(1−1ϵinfy′∈CK(x)d(y,y′),0)

What this does is it's 1 on the set CK(x), and 0 on anything more than ϵ away from it. One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), so:

limn→∞infy∈CK(xn)f1(y)=infy∈CK(x)f1(y)

The latter term is 1 because f1 is 1 over CK(x). However, because we're assuming that infinitely often, there's a point in CK(xn) that is ϵ away from CK(x), the sequence on the left-hand side is infinitely often 0, so it doesn't converge and we have a contradiction.

For the second part, assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). By compactness of CK(x), we can find finitely many points yi in it s.t. every point in CK(x) is only ϵ2 away from one of the yi (cover CK(x) with ϵ2-size open balls centered on points in it and take a finite subcover). Now, for each of these, we can craft a function

fi:=λy.inf(1,2ϵd(y,yi))

So, this is 0 at the point yi, and 1 at any distance ϵ2 or more away from it.

One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), and there are finitely many fi, so there's some time where all of them nearly converge, ie:

limn→∞supi|K(xn)(fi)−K(x)(fi)|=0

However, infinitely often there's a point yn∈CK(x) that is ϵ away from CK(xn). yn is ϵ2 away from some yi, so that yi can't be closer than ϵ2 to CK(xn). (if it was closer, then we could pick some point in CK(xn) that's closer than ϵ2 to yi, and then since it's only ϵ2 away from yn, we'd have that the distance from yn to CK(xn) is below ϵ2, an impossibility).

Because the distance from yi to any point in CK(xn) is above ϵ2, then

|K(xn)(fi)−K(x)(fi)|=|infy∈CK(xn)fi(y)−infy∈CK(x)fi(y)|=|1−0|=1

This is because yi∈CK(x) and attains a value of 0 according to fi, while CK(xn) stays away from yi and all its points must have a value of 1. This situation happens infinitely often, which leads to a contradiction with

limn→∞supi|K(xn)(fi)−K(x)(fi)|=0

Because infinitely often, one of these fi has very different values, so the sequence is 1 infinitely often and can't limit to 0.

So, we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). And we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). Fixing any ϵ, in the tail of the sequence, CK(x) and CK(xn) are ϵ distance or closer in Hausdorff distance because you can't find points in either set which are far away from the other set. So, CK(xn) limits to CK(x) in Hausdorff-distance when xn limits to x, and we know that x↦CK(x) is a continuous function X→K(Y).

This lets us show that the set

⋃x∈Ch({x}×CK(x))

is closed, because if xn limits to x and yn∈CK(xn) and yn limits to y, we have that y∈CK(x) because CK(xn) limits to CK(x) in Hausdorff distance, so we've got closed graph.

Also, by invoking Lemma 3, we know that

⋃x∈ChCK(x)

is compact.

Time to wrap this all up. We know that ⋃x∈Ch{x}×CK(x) is closed in X×Y from our Hausdorff limit argument. This set is also a subset of:

Ch×⋃x∈ChCK(x)

Which is a product of two sets known to be compact, and is compact. It's a closed subset of a compact set, so it's compact. Therefore,

⋃x∈Ch{x}×CK(x)

is a compact set, and from way back,

(h⋉K)(f)=inf(x,y)∈⋃x∈Ch({x}×CK(x))f(x,y)

And we've shown that set is compact, so h⋉K where h and all the K(x) are sharp can be written as minimizing over a compact set, so h⋉K is sharp. Thus, semidirect product preserves all the nice properties, and we're finally done with this proof.

Proposition 20:If all theK(x)are C-additive, thenprX∗(h⋉K)=h.prX∗(h⋉K)(f)=(h⋉K)(f∘prX)=h(λx.K(x)(λy.f(prX(x,y))))

=h(λx.K(x)(λy.f(x)))=h(λx.f(x))=h(f)

This is because, since f(x) doesn't depend on y, it acts as a constant inside K(x) and C-additivity lets us pull it out.

Proposition 21:IfK0,K1,K2...are a sequence of infrakernels of typeKn:∏i=ni=0Xiik→Xn+1, andhis an infradistribution overX0, then(...((h⋉K0)⋉K1)...⋉Km)can be rewritten ash⋉K:mwhereK:nis an infrakernel of typeX0ik→∏i=n+1i=1Xi, recursively defined asK:0:=K0andK:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))So, for our inductive definition,

K:0(x0):=K0(x0)

K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)

Our task is to show that these are all infrakernels, by induction, and that for any infradistribution h,

(...((h⋉K0)⋉K1)...⋉Kn)=h⋉K:n

For the base case, we observe that K:0 is an infrakernel because it equals K0, which is an infrakernel, and that h⋉K0=h⋉K:0

Time for the induction step. We'll assume that K:n is an infrakernel, and show that K:n+1 is. Further, we need to show that h⋉K:n+1=(h⋉K:n)⋉Kn+1. This will show the result.

Our first requirement is showing that for all x0, K:n+1(x0) is an infradistribution.

K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))

By our induction assumption, K:n(x0) is an infradistribution as K:n is an infrakernel. Further, λx1:n+1.Kn+1(x0,x1:n+1) is an infrakernel because Kn+1 is and we're just restricting it to a subset of its domain, so it keeps being an infrakernel. And we know from earlier that the semidirect product of an infradistribution and an infrakernel is an infradistribution. So that's taken care of.

Now, we must show a common Lipschitz constant, pointwise function convergence, and compact-shared compact almost-support for K:n+1 to certify that it's an infrakernel.

Starting with common Lipschitz constant, we can just note that, in our proof of Proposition 19, we saw that the Lipschitz constant of the semidirect product was upper-bounded by the product of the Lipschitz constants of the starting infradistributions and the kernel. Assuming that K:n is an infradistribution, we have that the Lipschitz constant of any K:n(x0) is upper-bounded by some λ⊙:n Lipschitz constant. Also, the Lipschitz constant of Kn+1(x0,x1:n+1) is upper-bounded by some λ⊙n+1 Lipschitz constant. Thus, λ⊙:nλ⊙n+1 is an upper-bound on the Lipschitz constant of any

K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))

infradistribution, which is exactly K:n+1(x0), witnessing that K:n+1 has a uniform upper bound on its Lipschitz constants.

Time to move onto the second one, compact-shared compact almost-support.

For this one, we're trying to prove:

∀CX0,ϵ∃C∏i=n+2i=1Xiϵ⊆∏i=n+2i=1Xi∀x0∈CX0,f,f′:

f↓C∏i=n+2i=1Xiϵ=f′↓C∏i=n+2i=1Xiϵ→|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵd(f,f′)

This is the sentence that says that K:n+1 has compact-shared compact almost-support. f and f′ have type signature ∏i=n+2i=1Xi→R.

Now, this is going to be quite complicated, so pay close attention. Fix an arbitrary compact CX0⊆X0, and an arbitrary ϵ. Let λ⊙:n be the Lipschitz constant for the infrakernel K:n, and λ⊙n+1 be the Lipschitz constant for the infrakernel Kn+1.

Due to compact-shared compact-almost-support for K:n which exists by our induction assumption, your set CX0 induces a compact ϵ2λ⊙n+1-almost-support for the family of infradistributions K:n(x0) where x0∈CX0. Call said almost-support C∏i=n+1i=1Xiϵ2λ⊙n+1.

Further, due to compact-shared compact-almost-support for Kn+1 , the set

CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1

induces a compact ϵ2λ⊙:n-almost-support for the family of infradistributions Kn+1(x0,x1:n+1) where (x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1

Call said almost-support CXn+2ϵ2λ⊙:n

And now let your shared ϵ-almost-support for K:n+1(x0) where x0∈CX0 be:

C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n

We must show that said set is indeed a shared ϵ-almost-support for K:n+1(x0) where x0∈CX0. So, let f and f′ agree on said set. Then, we have:

|K:n+1(x0)(f)−K:n+1(x0)(f′)|

=|(K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)))(f)−(K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)))(f′)|

=|K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))

−K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2)))|

This is just unpacking the definition of the iterated semidirect product, no issues here. Now, we use Lemma 2 and the fact that C∏i=n+1i=1Xiϵ2λ⊙n+1 is a ϵ2λ⊙n+1-almost-support for K:n(x0) when x0∈CX0, to get:

≤λ⊙:nsupx1:n+1∈C∏i=n+1i=1Xiϵ2λ⊙n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))

−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|

+ϵ2λ⊙n+1supx1:n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))

−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|

Ok, this is a mess. Let's try to unpack

supx1:n+1∈C∏i=n+1i=1Xiϵ2λ⊙n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))

−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|

first. What we can do is use that, regardless of what is picked in the supremum, we have:

(x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1

So this means that

CXn+2ϵ2λ⊙:n

is a ϵ2λ⊙:n-almost-support for Kn+1(x0,x1:n+1). Further, because f and f′ are identical on

C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n

and x1:n+1 was being selected from the former of those, then the functions λxn+2.f(x1:n+