Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Proposition 10:Mixture, updating, and continuous pushforward preserve the properties indicated by the diagram, and always produce an infradistribution.

We'll start with showing that mixture, updating, and continuous pushfoward are always infradistributions, and then turn to property verification.

We know from the last post that mixture, updating, and continuous pushfoward preserve all infradistribution properties (although you need to be careful about whether mixture preserves Lipschitzness, you need that the expected value of the Lipschitz constant is finite), but we added the new one about compact almost-support, so that's the only part we need to re-verify.

To show that mixture has compact almost-support, remember that (Eζhi)(f)=Eζ(hi(f)) Now, fix an ϵ, we will craft a compact set that accounts for all but ϵ of why functions have the expectation values they do. There is some n where ∑i>nζiλ⊙i<ϵ2, where λ⊙i is the Lipschitz constant of the infradistribution hi. Then, let Cϵ be ⋃i≤nCi,ϵ2, the union of the compact ϵ2-almost-supports for the infradistributions hi,i≤n. This is a finite union of compact sets, so it's compact.

Now we can go: |(Eζhi)(f)−(Eζhi)(f)|=|Eζ(hi(f))−Eζ(hi(f′))|≤∑iζi|hi(f)−hi(f′)| =∑i≤nζi|hi(f)−hi(f′)|+∑i>nζi|hi(f)−hi(f′)|≤∑i≤nζiϵ2d(f,f′)+∑i>nζiλ⊙id(f,f′) =d(f,f′)(ϵ2∑i≤nζi+∑i>nζiλ⊙i)<d(f,f′)(ϵ2+ϵ2)=ϵd(f,f′) The first equality is reexpressing mixtures, and the first inequality is moving the expectation outside the absolute value which doesn't decrease value, then we break up the expectation for the second equality. The second inequality is because the gap between hi(f) and hi(f′) has a trivial upper bound from the Lipschitzness of hi, and for i≤n, we have that f and f′ agree on the union of the ϵ2-almost-supports for the hi,i≤n, so a particular infradistribution, by the definition of an almost-support, has these two expectations having not-very-different values. Then we just pull the gap between f and f′ out, and use the fact that for the mixture to work, ∑iζiλ⊙i<∞, and we picked n big enough for that last tail of the infinite sum to be small. Then we're done.

Now, we will show compact almost-support for h|gL assuming h has compact almost-support. Fix an ϵ. Your relevant set for support(L) will be CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙} Where the first term is a compact set that is a ϵPgh(L)2-almost-support for h, and that last set is a sort of "this point must be likely enough". λ⊙ will be the Lipschitz constant of the original h. Yes, this intersection may be empty.

Now, here's how things go. Let f and f′ agree on that intersection. (if it's the empty set, then it can be any two functions). We can go: |(h|gL)(f)−(h|gL)(f′)|=∣∣∣h(f★Lg)−h(0★Lg)h(1★Lg)−h(0★Lg)−h(f′★Lg)−h(0★Lg)h(1★Lg)−h(0★Lg)∣∣∣ =1h(1★Lg)−h(0★Lg)|h(f★Lg)−h(0★Lg)−h(f′★Lg)+h(0★Lg)| =1Pgh(L)|h(f★Lg)−h(f′★Lg)|=1Pgh(L)|h(Lf+(1−L)g)−h(Lf′+(1−L)g)| So far, this is just a standard sequence of rewrites. The definition of the update, pulling the fraction out, using Pgh(L) to abbreviate the rescaling term, and unpacking what ★L means.

Now, let's see how different Lf+(1−L)g and Lf′+(1−L)g are on the set CϵPgh(L)2. One of two things will occur. Our first possibility is that an x in that compact set also has L(x)≥ϵPgh(L)2λ⊙. Then x∈CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙} and f,f′ were selected to be equal on that set, so the two functions will be identical on that point. Our second possibility is that x in that compact set will have L(x)<ϵPgh(L)2λ⊙. In that case, |L(x)f(x)+(1−L(x))g(x)−L(x)f′(x)−(1−L(x))g(x)| =|L(x)f(x)−L(x)f′(x)|=L(x)|f(x)−f′(x)|≤ϵPgh(L)2λ⊙d(f,f′) Because L(x)<δ.

Putting this together, Lf+(1−L)g and Lf′+(1−L)g are only ϵPgh(L)2λ⊙d(f,f′) apart when restricted to the compact set CϵPgh(L)2. By Lemma 2, we can then show that |h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤λ⊙⋅ϵPgh(L)2λ⊙d(f,f′)+ϵPgh(L)2d(Lf+(1−L)g,Lf′+(1−L)g) And, we also know that: d(Lf+(1−L)g,Lf′+(1−L)g)=d(Lf,Lf′)≤d(f,f′) Because L∈[0,1]. Making that substitution, we have: |h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤λ⊙⋅ϵPgh(L)2λ⊙d(f,f′)+ϵPgh(L)2d(f,f′) =ϵPgh(L)2d(f,f′)+ϵPgh(L)2d(f,f′)=ϵPgh(L)d(f,f′)

Backing up to earlier, we had established that |(h|gL)(f)−(h|gL)(f′)|=1Pgh(L)|h(Lf+(1−L)g)−h(Lf′+(1−L)g)| and from shortly above, we established that |h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤ϵPgh(L)d(f,f′) Putting these together, |(h|gL)(f)−(h|gL)(f′)|<ϵd(f,f′) For any two functions f and f′ which agree on CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙} Witnessing that said set is an ϵ-almost-support for h|gL.

All we need to finish up is to show that this is a compact set in support(L) equipped with the subspace topology. This can be done by observing that in the original space X it's a compact set, due to being the intersection of a compact set and a closed set. In the subspace topology, if we try to make an open cover of it, all the open sets that cover it in the subspace topology are the restrictions of open sets in the original topology, so we have an open cover of this set in the original topology, and we can make a finite subcover, so it's compact in the subspace topology as well.

Thus, for any ϵ, we can make a compact (in support(L)) ϵ-almost-support for h|gL, so h|gL has compact almost-support and we've verified the last condition for an update of an infradistribution to be an update.

Now for deterministic pushfoward. Fix an ϵ, and let your appropriate set for g∗(h) be g(Cϵ) where Cϵ is a compact ϵ-almost-support for h. The image of a compact set is compact, so that part is taken care of. We still need to check that it's an ϵ-almost-support for g∗(h). Let f,f′ be equal on this set. Then |g∗(h)(f)−g∗(h)(f′)|=|h(f∘g)−h(f′∘g)|≤ϵd(f∘g,f′∘g) =ϵsupx|f(g(x)),f′(g(x))|≤ϵsupy|f(y),f′(y)|=ϵd(f,f′) And we're done. This is because, for any point x∈Cϵ, feeding it through g makes a point in g(Cϵ), and feeding it through f and f′ produces identical results because they agree on g(Cϵ). Therefore, f∘g and f′∘g agree on Cϵ and thus can have values only ϵd(f∘g,f′∘g) apart, which is actually upper-bounded by ϵd(f,f′). g(Cϵ) is thus a compact ϵ-almost-support for g∗(h), and this can be done for any ϵ, so g∗(h) has compact almost-support.

Since these three operations always produce infradistributions (as we've shown, we verified the last condition). Updating only has two properties to check, preserving homogenity when g=0 and cohomogenity when g=1, so let's get that knocked out.

Homogenity using homogenity for h (h|0L)(af)=h(af★L0)−h(0★L0)h(1★L0)−h(0★L0)=h(Laf)−h(0)h(1★L0)−h(0★L0)=ah(Lf)h(1★L0)−h(0★L0) =ah(Lf)h(1★L0)−h(0★L0)=ah(Lf)−h(0)h(1★L0)−h(0★L0)=ah(f★L0)−h(0★L0)h(1★L0)−h(0★L0)=a(h|0L)(f) Cohomogenity using cohomogenity for h (h|1L)(1+af)=h((1+af)★L1)−h(0★L1)h(1★L1)−h(0★L1)=h(L+aLf+1−L)−h(1−L)h(1)−h(1−L) =h(1aLf)−h(1−L)1−h(1−L)=(1−a+ah(1+Lf))−h(1−L)1−h(1−L) =1−a+ah(1+Lf)−h(1−L)+ah(1−L)−ah(1−L)1−h(1−L) =1−h(1−L)1−h(1−L)−a−ah(1−L)1−h(1−L)+ah(1+Lf)−ah(1−L)1−h(1−L) =1−h(1−L)1−h(1−L)−a1−h(1−L)1−h(1−L)+ah(1+Lf)−h(1−L)1−h(1−L) =1−a+ah(1+Lf)−h(1−L)h(1)−h(1−L)=1−a+ah(L+Lf+(1−L))−h(1−L)h(L+(1−L))−h(1−L) =1−a+ah((1+f)★L1)−h(0★L1)h(1★L1)−h(0★L1)=1−a+a(h|1L)(1+f)

Now for mixtures, we'll verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, and crispness.

Homogenity: (Eζhi)(af)=Eζ(hi(af))=Eζ(ahi(f))=aEζ(hi(f))=a(Eζhi)(f) 1-Lipschitz: |(Eζhi)(f)−(Eζhi)(f′)|=|Eζ(hi(f))−Eζ(hi(f′))| ≤Eζ|hi(f)−hi(f′)|≤Eζd(f,f′)=d(f,f′) Cohomogenity: (Eζhi)(1+af)=Eζ(hi(1+af))=Eζ(1−a+ahi(1+f)) =1−a+aEζ(hi(1+f))=1−a+a(Eζhi)(1+f) C-additivity: (Eζhi)(c)=Eζ(hi(c))=Eζ(c)=c Crispness: Observe that both homogenity and C-additivity are preserved, and crispness is equivalent to the conjunction of the two.

Now for deterministic pushforwards, we'll verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness.

Homogenity: (g∗(h))(af)=h((af)∘g)=h(a(f∘g))=ah(f∘g)=a(g∗(h))(f) 1-Lipschitzness: |(g∗(h))(f)−(g∗(h))(f′)|=|h(f∘g)−h(f′∘g)|≤d(f∘g,f′∘g)≤d(f,f′) Cohomogenity: (g∗(h))(1+af)=h((1+af)∘g)=h(1+a(f∘g))=1−a+ah(1+(f∘g)) =1−a+ah((1+f)∘g)=1−a+a(g∗(h))(1+f) C-additivity: (g∗(h))(c)=h(c∘g)=h(c)=c Crispness: Both homogenity and C-additivity are preserved, so crispness is preserved too.

Sharpness: (g∗(h))(f)=h(f∘g)=infx∈Cf(g(x))=infy∈g(C)f(y) And g(C) is the image of a compact set, so it's compact. And we're done!

Proposition 11:The inf of two infradistributions is always an infradistribution, and inf preserves the infradistribution properties indicated by the diagram at the start of this section.

We'll first verify the infradistribution properties of the inf, and then show it preserves the indicated properties if both components have them.

We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if f′≥f, then inf(h1,h2)(f′)=inf(h1(f′),h2(f′))≥inf(h1(f),h2(f))=inf(h1,h2)(f) This was done by monotonicity for the components. For concavity, inf(h1,h2)(pf+(1−p)f′)=inf(h1(pf+(1−p)f′),h2(pf+(1−p)f′)) ≥inf(ph1(f)+(1−p)h1(f′),ph2(f)+(1−p)h2(f′)) ≥inf(ph1(f),ph2(f))+inf((1−p)h1(f′),(1−p)h2(f′)) =pinf(h1(f),h2(f))+(1−p)inf(h1(f′),h2(f′)) =pinf(h1,h2)(f)+(1−p)inf(h1,h2)(f′) The first ≥ happened because h1 and h2 are concave, the second is because inf(a+b,c+d)≥inf(a,c)+inf(b,d).

For normalization, inf(h1,h2)(1)=inf(h1(1),h2(1))=inf(1,1)=1 And the same argument applies to 0, so the inf is normalized.

For Lipschitzness, the inf of two Lipschitz functions is Lipschitz.

That just leaves compact almost-support. Fix an arbitary ϵ, and get a C1ϵ compact ϵ-almost-support for h1, and a C2ϵ for h2. We will show that C1ϵ∪C2ϵ is a compact ϵ-almost-support for inf(h1,h2). It's compact because it's a finite union of compact sets.

Now, let f and f′ agree on C1ϵ∪C2ϵ. We can go: |inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))| There are four possible cases for evaluating this quantity. In case 1, h1(f)≤h2(f) and h1(f′)≤h2(f′). Then our above term turns into |h1(f)−h1(f′)|. However, since f and f′ agree on C1ϵ∪C2ϵ, they must agree on C1ϵ, and only have expectations ≤ϵd(f,f′) apart. Case 2 where h1(f)≥h2(f) and h1(f′)≥h2(f′) is symmetric and can be disposed of by a nearly identical argument, we just do it with h2 and C2ϵ.

Case 3 where h1(f)<h2(f) and h1(f′)>h2(f′) takes a slightly fancier argument. We can go: −ϵd(f,f′)<h1(f)−h1(f′)<h1(f)−h2(f′)<h2(f)−h2(f′)<ϵd(f,f′) The end inequalities are because f and f′ agree on the ϵ-almost-supports of h1 and h2, respectively, from agreeing on the union. The two inner inequalities are derived from the assumed inequalities in Case 3. Thus, |inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h2(f′)|<ϵd(f,f′) Case 4 where the assumed starting inequalities go in the other direction is symmetric. So, no matter which infradistributions are lower in the two infs, we have |inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|<ϵd(f,f′) And we're done, we made a compact almost-support for inf(h1,h2) assuming an arbitrary ϵ. So the inf of two infradistributions is a infradistribution.

Now to verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness preservation.

Homogenity: inf(h1,h2)(af)=inf(h1(af),h2(af))=inf(ah1(f),ah2(f)) =ainf(h1(f),h2(f))=ainf(h1,h2)(f) 1-Lipschitzness: |inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))| Now we can split into four cases. In cases 1 and 2 where the infs turn into h1(f),h1(f′) (and same for h2 in case 2), we have: |inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h1(f′)|≤d(f,f′) (and same for h2), and we're done with those cases. In cases 3 and 4 where the infs turn into h1(f),h2(f′) (and vice-versa for case 4), we have: −d(f,f′)≤h1(f)−h1(f′)<h1(f)−h2(f′)<h2(f)−h2(f′)≤d(f,f′) Because h1 and h2 are 1-Lipschitz. Thus, |inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h2(f′)|≤d(f,f′) A symmetric argument works for case 4. So, no matter what, |inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|≤d(f,f′) And we're done, the inf is 1-Lipschitz too.

Cohomogenity: inf(h1,h2)(1+af)=inf(h1(1+af),h2(1+af)) =inf(1−a+ah1(1+f),1−a+ah2(1+f)) =1−a+ainf(h1(1+f),h2(1+f)) =1−a+ainf(h1,h2)(1+f) C-additivity: inf(h1,h2)(c)=inf(h1(c),h2(c))=inf(c,c)=c Crispness: Homogenity and C-additivity are both preserved, so crispness is preserved.

Sharpness: inf(h1,h2)(f)=inf(h1(f),h2(f))=inf(infx∈C1f(x),infx∈C2f(x))=infx∈C1∪C2f(x) And we're done.

Proposition 13:If a family of infradistributions {hi}i∈I has a shared upper bound on the Lipschitz constant, and for all ϵ, there is a compact set Cϵ that is an ϵ-almost support for all hi, then infihi, defined as (infihi)(f):=infi(hi(f)), is an infradistribution. Further, for all conditions listed in the table, if all the hi fulfill them, then infihi fulfills the same property.

We'll first verify the infradistribution properties of the infinite inf, and then show it preserves the indicated properties if all components have them.

We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if f′≥f, then (infihi)(f′)=infi(hi(f′))≥infi(hi(f))=(infihi)(f) This was done by monotonicity for all components. For concavity, (infihi)(pf+(1−p)f′)=infi(hi(pf+(1−p)f′))≥infi(phi(f)+(1−p)hi(f′)) ≥infi(phi(f))+infi((1−p)hi(f′))=pinfi(hi(f))+(1−p)infi(hi(f′)) =p(infihi)(f)+(1−p)(infihi)(f′) The first ≥ happened because h1 and h2 are concave, the second is because infi(ai+bi)≥infi(ai)+infi(bi).

For normalization, (infihi)(1)=infi(hi(1))=infi(1)=1 And the same argument applies to 0, so the inf is normalized.

For Lipschitzness, let λ⊙ be your uniform upper bound on the Lipschitz constants of the hi. Then, |(infihi)(f)−(infihi)(f′)|=|infi(hi(f))−infi(hi(f′))| And then, for all the hi, they only think those functions differ by λ⊙d(f,f′) or less, and the same property applies to the inf by picking a hi and hi′ that very very nearly attain the two minimums, and showing that if the infinimums were >λ⊙d(f,f′) apart, you could have hi(f′) appreciably undershoot hi′(f′), and in fact, undershoot infi(hi(f′)), which is impossible. Thus, |infi(hi(f))−infi(hi(f′))|≤λ⊙d(f,f′) And we're done.

That just leaves compact almost-support. Fix an arbitary ϵ. We know there is some Cϵ that is a compact ϵ-almost-support for all the hi. We will show that Cϵ is an ϵ-almost-support for infihi.

Let f and f′ agree on Cϵ. We can go: |(infihi)(f)−(infihi)(f′)|=|infi(hi(f))−infi(hi(f′))| Pick a hi and hi′ that very very very nearly attain the inf. Then we can approximately reexpress this quantity as: |inf(hi(f),hi′(f))−inf(hi(f′),hi′(f′))| We're approximately in a case where hi(f)≤hi′(f) and hi(f′)≥hi′(f′), so we can go: −ϵd(f,f′)≤hi(f)−hi(f′)<hi(f)−hi′(f′)<hi′(f)−hi′(f′)≤ϵd(f,f′) The end inequalities are because f and f′ agree on the ϵ-almost-support of hi and hi′. The two inner inequalities are derived from the assumed inequalities in our case. Thus, |infi(hi(f))−infi(hi(f′))|≃|hi(f)−hi′(f′)|≤ϵd(f,f′) And we're done, we made a compact almost-support for infihi assuming an arbitrary ϵ. So the inf of this family of infradistributions is a infradistribution.

Now to verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness preservation.

Homogenity: (infihi)(af)=infi(hi(af))=infi(ahi(f)) =ainfi(hi(f))=a(infihi)(f) 1-Lipschitzness: Same as the Lipschitz argument, everyone has a Lipschitz constant of 1, so the inf has the same Lipschitz constant.

Cohomogenity: (infihi)(1+af)=infi(hi(1+af)) =infi(1−a+ahi(1+f)) =1−a+ainfi(hi(1+f)) =1−a+a(infihi)(1+f) C-additivity: (infihi)(c)infi(hi(c))=infi(c)=c Crispness: Homogenity and C-additivity are both preserved, so crispness is preserved.

Sharpness: (infihi)(f)=inf(hi(f))=infi(infx∈Cif(x))=infx∈⋃iCif(x)=infx∈¯¯¯¯¯¯¯¯¯¯¯⋃iCif(x) We do have to check whether or not ⋃iCi is compact, however. We'll start by showing that for an arbitrary Ci, any compact set K where Ci⊈K can't be an ϵ-support of hi for any ϵ<1. The proof proceeds as follows:

Let x∗ be some point in Ci but not in K. It must be some finite distance away from K. Craft a continuous function f0 supported on K∪{x∗}. f0 is 1 on K and 0 on {x∗}. Use the Tietze extension theorem to extend f0 to all of X. Then |hi(f0)−hi(1)|=|infx∈Cif0(x)−infx∈Ci1|=|f0(x∗)−1|=|0−1|=1 However, f0 and 1 agree on K, so K can't be an ϵ-almost-support for any ϵ<1.

Thus, in order for there to be a compact set Cϵ that's an ϵ-almost-support for allhi, it must be that ∀i:Ci⊆Cϵ. Then ¯¯¯¯¯¯¯¯¯¯¯¯⋃iCi⊆Cϵ because all the Ci are in it and Cϵ is closed. So, the closure of our union is a closed subset of a compact set and thus is compact, so infihi is minimizing over a compact set and thus is crisp.

Proposition 14:If sup(h1,h2)(0)=0 and sup(h1,h2)(1)

Proposition 10:Mixture, updating, and continuous pushforward preserve the properties indicated by the diagram, and always produce an infradistribution.We'll start with showing that mixture, updating, and continuous pushfoward are always infradistributions, and then turn to property verification.

We know from the last post that mixture, updating, and continuous pushfoward preserve all infradistribution properties (although you need to be careful about whether mixture preserves Lipschitzness, you need that the expected value of the Lipschitz constant is finite), but we added the new one about compact almost-support, so that's the only part we need to re-verify.

To show that mixture has compact almost-support, remember that

(Eζhi)(f)=Eζ(hi(f))

Now, fix an ϵ, we will craft a compact set that accounts for all but ϵ of why functions have the expectation values they do. There is some n where ∑i>nζiλ⊙i<ϵ2, where λ⊙i is the Lipschitz constant of the infradistribution hi. Then, let Cϵ be ⋃i≤nCi,ϵ2, the union of the compact ϵ2-almost-supports for the infradistributions hi,i≤n. This is a finite union of compact sets, so it's compact.

Now we can go:

|(Eζhi)(f)−(Eζhi)(f)|=|Eζ(hi(f))−Eζ(hi(f′))|≤∑iζi|hi(f)−hi(f′)|

=∑i≤nζi|hi(f)−hi(f′)|+∑i>nζi|hi(f)−hi(f′)|≤∑i≤nζiϵ2d(f,f′)+∑i>nζiλ⊙id(f,f′)

=d(f,f′)(ϵ2∑i≤nζi+∑i>nζiλ⊙i)<d(f,f′)(ϵ2+ϵ2)=ϵd(f,f′)

The first equality is reexpressing mixtures, and the first inequality is moving the expectation outside the absolute value which doesn't decrease value, then we break up the expectation for the second equality. The second inequality is because the gap between hi(f) and hi(f′) has a trivial upper bound from the Lipschitzness of hi, and for i≤n, we have that f and f′ agree on the union of the ϵ2-almost-supports for the hi,i≤n, so a particular infradistribution, by the definition of an almost-support, has these two expectations having not-very-different values. Then we just pull the gap between f and f′ out, and use the fact that for the mixture to work, ∑iζiλ⊙i<∞, and we picked n big enough for that last tail of the infinite sum to be small. Then we're done.

Now, we will show compact almost-support for h|gL assuming h has compact almost-support. Fix an ϵ. Your relevant set for support(L) will be

CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙}

Where the first term is a compact set that is a ϵPgh(L)2-almost-support for h, and that last set is a sort of "this point must be likely enough". λ⊙ will be the Lipschitz constant of the original h. Yes, this intersection may be empty.

Now, here's how things go. Let f and f′ agree on that intersection. (if it's the empty set, then it can be any two functions). We can go:

|(h|gL)(f)−(h|gL)(f′)|=∣∣∣h(f★Lg)−h(0★Lg)h(1★Lg)−h(0★Lg)−h(f′★Lg)−h(0★Lg)h(1★Lg)−h(0★Lg)∣∣∣

=1h(1★Lg)−h(0★Lg)|h(f★Lg)−h(0★Lg)−h(f′★Lg)+h(0★Lg)|

=1Pgh(L)|h(f★Lg)−h(f′★Lg)|=1Pgh(L)|h(Lf+(1−L)g)−h(Lf′+(1−L)g)|

So far, this is just a standard sequence of rewrites. The definition of the update, pulling the fraction out, using Pgh(L) to abbreviate the rescaling term, and unpacking what ★L means.

Now, let's see how different Lf+(1−L)g and Lf′+(1−L)g are on the set CϵPgh(L)2. One of two things will occur. Our first possibility is that an x in that compact set also has L(x)≥ϵPgh(L)2λ⊙. Then

x∈CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙}

and f,f′ were selected to be equal on that set, so the two functions will be identical on that point. Our second possibility is that x in that compact set will have L(x)<ϵPgh(L)2λ⊙. In that case,

|L(x)f(x)+(1−L(x))g(x)−L(x)f′(x)−(1−L(x))g(x)|

=|L(x)f(x)−L(x)f′(x)|=L(x)|f(x)−f′(x)|≤ϵPgh(L)2λ⊙d(f,f′)

Because L(x)<δ.

Putting this together, Lf+(1−L)g and Lf′+(1−L)g are only ϵPgh(L)2λ⊙d(f,f′) apart when restricted to the compact set CϵPgh(L)2. By Lemma 2, we can then show that

|h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤λ⊙⋅ϵPgh(L)2λ⊙d(f,f′)+ϵPgh(L)2d(Lf+(1−L)g,Lf′+(1−L)g)

And, we also know that:

d(Lf+(1−L)g,Lf′+(1−L)g)=d(Lf,Lf′)≤d(f,f′)

Because L∈[0,1]. Making that substitution, we have:

|h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤λ⊙⋅ϵPgh(L)2λ⊙d(f,f′)+ϵPgh(L)2d(f,f′)

=ϵPgh(L)2d(f,f′)+ϵPgh(L)2d(f,f′)=ϵPgh(L)d(f,f′)

Backing up to earlier, we had established that

|(h|gL)(f)−(h|gL)(f′)|=1Pgh(L)|h(Lf+(1−L)g)−h(Lf′+(1−L)g)|

and from shortly above, we established that

|h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤ϵPgh(L)d(f,f′)

Putting these together,

|(h|gL)(f)−(h|gL)(f′)|<ϵd(f,f′)

For any two functions f and f′ which agree on

CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙}

Witnessing that said set is an ϵ-almost-support for h|gL.

All we need to finish up is to show that this is a compact set in support(L) equipped with the subspace topology. This can be done by observing that in the original space X it's a compact set, due to being the intersection of a compact set and a closed set. In the subspace topology, if we try to make an open cover of it, all the open sets that cover it in the subspace topology are the restrictions of open sets in the original topology, so we have an open cover of this set in the original topology, and we can make a finite subcover, so it's compact in the subspace topology as well.

Thus, for any ϵ, we can make a compact (in support(L)) ϵ-almost-support for h|gL, so h|gL has compact almost-support and we've verified the last condition for an update of an infradistribution to be an update.

Now for deterministic pushfoward. Fix an ϵ, and let your appropriate set for g∗(h) be g(Cϵ) where Cϵ is a compact ϵ-almost-support for h. The image of a compact set is compact, so that part is taken care of. We still need to check that it's an ϵ-almost-support for g∗(h). Let f,f′ be equal on this set. Then

|g∗(h)(f)−g∗(h)(f′)|=|h(f∘g)−h(f′∘g)|≤ϵd(f∘g,f′∘g)

=ϵsupx|f(g(x)),f′(g(x))|≤ϵsupy|f(y),f′(y)|=ϵd(f,f′)

And we're done. This is because, for any point x∈Cϵ, feeding it through g makes a point in g(Cϵ), and feeding it through f and f′ produces identical results because they agree on g(Cϵ). Therefore, f∘g and f′∘g agree on Cϵ and thus can have values only ϵd(f∘g,f′∘g) apart, which is actually upper-bounded by ϵd(f,f′). g(Cϵ) is thus a compact ϵ-almost-support for g∗(h), and this can be done for any ϵ, so g∗(h) has compact almost-support.

Since these three operations always produce infradistributions (as we've shown, we verified the last condition). Updating only has two properties to check, preserving homogenity when g=0 and cohomogenity when g=1, so let's get that knocked out.

Homogenity using homogenity for h

(h|0L)(af)=h(af★L0)−h(0★L0)h(1★L0)−h(0★L0)=h(Laf)−h(0)h(1★L0)−h(0★L0)=ah(Lf)h(1★L0)−h(0★L0)

=ah(Lf)h(1★L0)−h(0★L0)=ah(Lf)−h(0)h(1★L0)−h(0★L0)=ah(f★L0)−h(0★L0)h(1★L0)−h(0★L0)=a(h|0L)(f)

Cohomogenity using cohomogenity for h

(h|1L)(1+af)=h((1+af)★L1)−h(0★L1)h(1★L1)−h(0★L1)=h(L+aLf+1−L)−h(1−L)h(1)−h(1−L)

=h(1aLf)−h(1−L)1−h(1−L)=(1−a+ah(1+Lf))−h(1−L)1−h(1−L)

=1−a+ah(1+Lf)−h(1−L)+ah(1−L)−ah(1−L)1−h(1−L)

=1−h(1−L)1−h(1−L)−a−ah(1−L)1−h(1−L)+ah(1+Lf)−ah(1−L)1−h(1−L)

=1−h(1−L)1−h(1−L)−a1−h(1−L)1−h(1−L)+ah(1+Lf)−h(1−L)1−h(1−L)

=1−a+ah(1+Lf)−h(1−L)h(1)−h(1−L)=1−a+ah(L+Lf+(1−L))−h(1−L)h(L+(1−L))−h(1−L)

=1−a+ah((1+f)★L1)−h(0★L1)h(1★L1)−h(0★L1)=1−a+a(h|1L)(1+f)

Now for mixtures, we'll verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, and crispness.

Homogenity:

(Eζhi)(af)=Eζ(hi(af))=Eζ(ahi(f))=aEζ(hi(f))=a(Eζhi)(f)

1-Lipschitz:

|(Eζhi)(f)−(Eζhi)(f′)|=|Eζ(hi(f))−Eζ(hi(f′))|

≤Eζ|hi(f)−hi(f′)|≤Eζd(f,f′)=d(f,f′)

Cohomogenity:

(Eζhi)(1+af)=Eζ(hi(1+af))=Eζ(1−a+ahi(1+f))

=1−a+aEζ(hi(1+f))=1−a+a(Eζhi)(1+f)

C-additivity:

(Eζhi)(c)=Eζ(hi(c))=Eζ(c)=c

Crispness: Observe that both homogenity and C-additivity are preserved, and crispness is equivalent to the conjunction of the two.

Now for deterministic pushforwards, we'll verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness.

Homogenity:

(g∗(h))(af)=h((af)∘g)=h(a(f∘g))=ah(f∘g)=a(g∗(h))(f)

1-Lipschitzness:

|(g∗(h))(f)−(g∗(h))(f′)|=|h(f∘g)−h(f′∘g)|≤d(f∘g,f′∘g)≤d(f,f′)

Cohomogenity:

(g∗(h))(1+af)=h((1+af)∘g)=h(1+a(f∘g))=1−a+ah(1+(f∘g))

=1−a+ah((1+f)∘g)=1−a+a(g∗(h))(1+f)

C-additivity:

(g∗(h))(c)=h(c∘g)=h(c)=c

Crispness: Both homogenity and C-additivity are preserved, so crispness is preserved too.

Sharpness:

(g∗(h))(f)=h(f∘g)=infx∈Cf(g(x))=infy∈g(C)f(y)

And g(C) is the image of a compact set, so it's compact. And we're done!

Proposition 11:The inf of two infradistributions is always an infradistribution, and inf preserves the infradistribution properties indicated by the diagram at the start of this section.We'll first verify the infradistribution properties of the inf, and then show it preserves the indicated properties if both components have them.

We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if f′≥f, then

inf(h1,h2)(f′)=inf(h1(f′),h2(f′))≥inf(h1(f),h2(f))=inf(h1,h2)(f)

This was done by monotonicity for the components. For concavity,

inf(h1,h2)(pf+(1−p)f′)=inf(h1(pf+(1−p)f′),h2(pf+(1−p)f′))

≥inf(ph1(f)+(1−p)h1(f′),ph2(f)+(1−p)h2(f′))

≥inf(ph1(f),ph2(f))+inf((1−p)h1(f′),(1−p)h2(f′))

=pinf(h1(f),h2(f))+(1−p)inf(h1(f′),h2(f′))

=pinf(h1,h2)(f)+(1−p)inf(h1,h2)(f′)

The first ≥ happened because h1 and h2 are concave, the second is because inf(a+b,c+d)≥inf(a,c)+inf(b,d).

For normalization,

inf(h1,h2)(1)=inf(h1(1),h2(1))=inf(1,1)=1

And the same argument applies to 0, so the inf is normalized.

For Lipschitzness, the inf of two Lipschitz functions is Lipschitz.

That just leaves compact almost-support. Fix an arbitary ϵ, and get a C1ϵ compact ϵ-almost-support for h1, and a C2ϵ for h2. We will show that C1ϵ∪C2ϵ is a compact ϵ-almost-support for inf(h1,h2). It's compact because it's a finite union of compact sets.

Now, let f and f′ agree on C1ϵ∪C2ϵ. We can go:

|inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|

There are four possible cases for evaluating this quantity. In case 1, h1(f)≤h2(f) and h1(f′)≤h2(f′). Then our above term turns into |h1(f)−h1(f′)|. However, since f and f′ agree on C1ϵ∪C2ϵ, they must agree on C1ϵ, and only have expectations ≤ϵd(f,f′) apart. Case 2 where h1(f)≥h2(f) and h1(f′)≥h2(f′) is symmetric and can be disposed of by a nearly identical argument, we just do it with h2 and C2ϵ.

Case 3 where h1(f)<h2(f) and h1(f′)>h2(f′) takes a slightly fancier argument. We can go:

−ϵd(f,f′)<h1(f)−h1(f′)<h1(f)−h2(f′)<h2(f)−h2(f′)<ϵd(f,f′)

The end inequalities are because f and f′ agree on the ϵ-almost-supports of h1 and h2, respectively, from agreeing on the union. The two inner inequalities are derived from the assumed inequalities in Case 3.

Thus,

|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h2(f′)|<ϵd(f,f′)

Case 4 where the assumed starting inequalities go in the other direction is symmetric. So, no matter which infradistributions are lower in the two infs, we have

|inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|<ϵd(f,f′)

And we're done, we made a compact almost-support for inf(h1,h2) assuming an arbitrary ϵ. So the inf of two infradistributions is a infradistribution.

Now to verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness preservation.

Homogenity:

inf(h1,h2)(af)=inf(h1(af),h2(af))=inf(ah1(f),ah2(f))

=ainf(h1(f),h2(f))=ainf(h1,h2)(f)

1-Lipschitzness:

|inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|

Now we can split into four cases. In cases 1 and 2 where the infs turn into h1(f),h1(f′) (and same for h2 in case 2), we have:

|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h1(f′)|≤d(f,f′)

(and same for h2), and we're done with those cases. In cases 3 and 4 where the infs turn into h1(f),h2(f′) (and vice-versa for case 4), we have:

−d(f,f′)≤h1(f)−h1(f′)<h1(f)−h2(f′)<h2(f)−h2(f′)≤d(f,f′)

Because h1 and h2 are 1-Lipschitz. Thus,

|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h2(f′)|≤d(f,f′)

A symmetric argument works for case 4. So, no matter what,

|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|≤d(f,f′)

And we're done, the inf is 1-Lipschitz too.

Cohomogenity:

inf(h1,h2)(1+af)=inf(h1(1+af),h2(1+af))

=inf(1−a+ah1(1+f),1−a+ah2(1+f))

=1−a+ainf(h1(1+f),h2(1+f))

=1−a+ainf(h1,h2)(1+f)

C-additivity:

inf(h1,h2)(c)=inf(h1(c),h2(c))=inf(c,c)=c

Crispness: Homogenity and C-additivity are both preserved, so crispness is preserved.

Sharpness:

inf(h1,h2)(f)=inf(h1(f),h2(f))=inf(infx∈C1f(x),infx∈C2f(x))=infx∈C1∪C2f(x)

And we're done.

Proposition 12:Einf(H1,H2)(f)=inf(EH1(f),EH2(f))Einf(H1,H2)(f)=inf(m,b)∈inf(H1,H2)m(f)+b=inf(m,b)∈H1∪H2m(f)+b

=inf(inf(m,b)∈H1(m(f)+b),inf(m,b)∈H2(m(f)+b))

=inf(EH1(f),EH2(f))

Proposition 13:If a family of infradistributions{hi}i∈Ihas a shared upper bound on the Lipschitz constant, and for allϵ, there is a compact setCϵthat is anϵ-almost support for allhi, theninfihi, defined as(infihi)(f):=infi(hi(f)), is an infradistribution. Further, for all conditions listed in the table, if all thehifulfill them, theninfihifulfills the same property.We'll first verify the infradistribution properties of the infinite inf, and then show it preserves the indicated properties if all components have them.

We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if f′≥f, then

(infihi)(f′)=infi(hi(f′))≥infi(hi(f))=(infihi)(f)

This was done by monotonicity for all components. For concavity,

(infihi)(pf+(1−p)f′)=infi(hi(pf+(1−p)f′))≥infi(phi(f)+(1−p)hi(f′))

≥infi(phi(f))+infi((1−p)hi(f′))=pinfi(hi(f))+(1−p)infi(hi(f′))

=p(infihi)(f)+(1−p)(infihi)(f′)

The first ≥ happened because h1 and h2 are concave, the second is because infi(ai+bi)≥infi(ai)+infi(bi).

For normalization,

(infihi)(1)=infi(hi(1))=infi(1)=1

And the same argument applies to 0, so the inf is normalized.

For Lipschitzness, let λ⊙ be your uniform upper bound on the Lipschitz constants of the hi. Then,

|(infihi)(f)−(infihi)(f′)|=|infi(hi(f))−infi(hi(f′))|

And then, for all the hi, they only think those functions differ by λ⊙d(f,f′) or less, and the same property applies to the inf by picking a hi and hi′ that very very nearly attain the two minimums, and showing that if the infinimums were >λ⊙d(f,f′) apart, you could have hi(f′) appreciably undershoot hi′(f′), and in fact, undershoot infi(hi(f′)), which is impossible. Thus,

|infi(hi(f))−infi(hi(f′))|≤λ⊙d(f,f′)

And we're done.

That just leaves compact almost-support. Fix an arbitary ϵ. We know there is some Cϵ that is a compact ϵ-almost-support for all the hi. We will show that Cϵ is an ϵ-almost-support for infihi.

Let f and f′ agree on Cϵ. We can go:

|(infihi)(f)−(infihi)(f′)|=|infi(hi(f))−infi(hi(f′))|

Pick a hi and hi′ that very very very nearly attain the inf. Then we can approximately reexpress this quantity as:

|inf(hi(f),hi′(f))−inf(hi(f′),hi′(f′))|

We're approximately in a case where hi(f)≤hi′(f) and hi(f′)≥hi′(f′), so we can go:

−ϵd(f,f′)≤hi(f)−hi(f′)<hi(f)−hi′(f′)<hi′(f)−hi′(f′)≤ϵd(f,f′)

The end inequalities are because f and f′ agree on the ϵ-almost-support of hi and hi′. The two inner inequalities are derived from the assumed inequalities in our case. Thus,

|infi(hi(f))−infi(hi(f′))|≃|hi(f)−hi′(f′)|≤ϵd(f,f′)

And we're done, we made a compact almost-support for infihi assuming an arbitrary ϵ. So the inf of this family of infradistributions is a infradistribution.

Now to verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness preservation.

Homogenity:

(infihi)(af)=infi(hi(af))=infi(ahi(f))

=ainfi(hi(f))=a(infihi)(f)

1-Lipschitzness: Same as the Lipschitz argument, everyone has a Lipschitz constant of 1, so the inf has the same Lipschitz constant.

Cohomogenity:

(infihi)(1+af)=infi(hi(1+af))

=infi(1−a+ahi(1+f))

=1−a+ainfi(hi(1+f))

=1−a+a(infihi)(1+f)

C-additivity:

(infihi)(c)infi(hi(c))=infi(c)=c

Crispness: Homogenity and C-additivity are both preserved, so crispness is preserved.

Sharpness:

(infihi)(f)=inf(hi(f))=infi(infx∈Cif(x))=infx∈⋃iCif(x)=infx∈¯¯¯¯¯¯¯¯¯¯¯⋃iCif(x)

We do have to check whether or not ⋃iCi is compact, however. We'll start by showing that for an arbitrary Ci, any compact set K where Ci⊈K can't be an ϵ-support of hi for any ϵ<1. The proof proceeds as follows:

Let x∗ be some point in Ci but not in K. It must be some finite distance away from K. Craft a continuous function f0 supported on K∪{x∗}. f0 is 1 on K and 0 on {x∗}. Use the Tietze extension theorem to extend f0 to all of X. Then

|hi(f0)−hi(1)|=|infx∈Cif0(x)−infx∈Ci1|=|f0(x∗)−1|=|0−1|=1

However, f0 and 1 agree on K, so K can't be an ϵ-almost-support for any ϵ<1.

Thus, in order for there to be a compact set Cϵ that's an ϵ-almost-support for

allhi, it must be that ∀i:Ci⊆Cϵ. Then¯¯¯¯¯¯¯¯¯¯¯¯⋃iCi⊆Cϵ

because all the Ci are in it and Cϵ is closed. So, the closure of our union is a closed subset of a compact set and thus is compact, so infihi is minimizing over a compact set and thus is crisp.

Proposition 14:Ifsup(h1,h2)(0)=0andsup(h1,h2)(1)