Proposition 10:Mixture, updating, and continuous pushforward preserve the properties indicated by the diagram, and always produce an infradistribution.
We'll start with showing that mixture, updating, and continuous pushfoward are always infradistributions, and then turn to property verification.
We know from the last post that mixture, updating, and continuous pushfoward preserve all infradistribution properties (although you need to be careful about whether mixture preserves Lipschitzness, you need that the expected value of the Lipschitz constant is finite), but we added the new one about compact almost-support, so that's the only part we need to re-verify.
To show that mixture has compact almost-support, remember that (Eζhi)(f)=Eζ(hi(f)) Now, fix an ϵ, we will craft a compact set that accounts for all but ϵ of why functions have the expectation values they do. There is some n where ∑i>nζiλ⊙i<ϵ2, where λ⊙i is the Lipschitz constant of the infradistribution hi. Then, let Cϵ be ⋃i≤nCi,ϵ2, the union of the compact ϵ2-almost-supports for the infradistributions hi,i≤n. This is a finite union of compact sets, so it's compact.
Now we can go: |(Eζhi)(f)−(Eζhi)(f)|=|Eζ(hi(f))−Eζ(hi(f′))|≤∑iζi|hi(f)−hi(f′)| =∑i≤nζi|hi(f)−hi(f′)|+∑i>nζi|hi(f)−hi(f′)|≤∑i≤nζiϵ2d(f,f′)+∑i>nζiλ⊙id(f,f′) =d(f,f′)(ϵ2∑i≤nζi+∑i>nζiλ⊙i)<d(f,f′)(ϵ2+ϵ2)=ϵd(f,f′) The first equality is reexpressing mixtures, and the first inequality is moving the expectation outside the absolute value which doesn't decrease value, then we break up the expectation for the second equality. The second inequality is because the gap between hi(f) and hi(f′) has a trivial upper bound from the Lipschitzness of hi, and for i≤n, we have that f and f′ agree on the union of the ϵ2-almost-supports for the hi,i≤n, so a particular infradistribution, by the definition of an almost-support, has these two expectations having not-very-different values. Then we just pull the gap between f and f′ out, and use the fact that for the mixture to work, ∑iζiλ⊙i<∞, and we picked n big enough for that last tail of the infinite sum to be small. Then we're done.
Now, we will show compact almost-support for h|gL assuming h has compact almost-support. Fix an ϵ. Your relevant set for support(L) will be CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙} Where the first term is a compact set that is a ϵPgh(L)2-almost-support for h, and that last set is a sort of "this point must be likely enough". λ⊙ will be the Lipschitz constant of the original h. Yes, this intersection may be empty.
Now, here's how things go. Let f and f′ agree on that intersection. (if it's the empty set, then it can be any two functions). We can go: |(h|gL)(f)−(h|gL)(f′)|=∣∣∣h(f★Lg)−h(0★Lg)h(1★Lg)−h(0★Lg)−h(f′★Lg)−h(0★Lg)h(1★Lg)−h(0★Lg)∣∣∣ =1h(1★Lg)−h(0★Lg)|h(f★Lg)−h(0★Lg)−h(f′★Lg)+h(0★Lg)| =1Pgh(L)|h(f★Lg)−h(f′★Lg)|=1Pgh(L)|h(Lf+(1−L)g)−h(Lf′+(1−L)g)| So far, this is just a standard sequence of rewrites. The definition of the update, pulling the fraction out, using Pgh(L) to abbreviate the rescaling term, and unpacking what ★L means.
Now, let's see how different Lf+(1−L)g and Lf′+(1−L)g are on the set CϵPgh(L)2. One of two things will occur. Our first possibility is that an x in that compact set also has L(x)≥ϵPgh(L)2λ⊙. Then x∈CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙} and f,f′ were selected to be equal on that set, so the two functions will be identical on that point. Our second possibility is that x in that compact set will have L(x)<ϵPgh(L)2λ⊙. In that case, |L(x)f(x)+(1−L(x))g(x)−L(x)f′(x)−(1−L(x))g(x)| =|L(x)f(x)−L(x)f′(x)|=L(x)|f(x)−f′(x)|≤ϵPgh(L)2λ⊙d(f,f′) Because L(x)<δ.
Putting this together, Lf+(1−L)g and Lf′+(1−L)g are only ϵPgh(L)2λ⊙d(f,f′) apart when restricted to the compact set CϵPgh(L)2. By Lemma 2, we can then show that |h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤λ⊙⋅ϵPgh(L)2λ⊙d(f,f′)+ϵPgh(L)2d(Lf+(1−L)g,Lf′+(1−L)g) And, we also know that: d(Lf+(1−L)g,Lf′+(1−L)g)=d(Lf,Lf′)≤d(f,f′) Because L∈[0,1]. Making that substitution, we have: |h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤λ⊙⋅ϵPgh(L)2λ⊙d(f,f′)+ϵPgh(L)2d(f,f′) =ϵPgh(L)2d(f,f′)+ϵPgh(L)2d(f,f′)=ϵPgh(L)d(f,f′)
Backing up to earlier, we had established that |(h|gL)(f)−(h|gL)(f′)|=1Pgh(L)|h(Lf+(1−L)g)−h(Lf′+(1−L)g)| and from shortly above, we established that |h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤ϵPgh(L)d(f,f′) Putting these together, |(h|gL)(f)−(h|gL)(f′)|<ϵd(f,f′) For any two functions f and f′ which agree on CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙} Witnessing that said set is an ϵ-almost-support for h|gL.
All we need to finish up is to show that this is a compact set in support(L) equipped with the subspace topology. This can be done by observing that in the original space X it's a compact set, due to being the intersection of a compact set and a closed set. In the subspace topology, if we try to make an open cover of it, all the open sets that cover it in the subspace topology are the restrictions of open sets in the original topology, so we have an open cover of this set in the original topology, and we can make a finite subcover, so it's compact in the subspace topology as well.
Thus, for any ϵ, we can make a compact (in support(L)) ϵ-almost-support for h|gL, so h|gL has compact almost-support and we've verified the last condition for an update of an infradistribution to be an update.
Now for deterministic pushfoward. Fix an ϵ, and let your appropriate set for g∗(h) be g(Cϵ) where Cϵ is a compact ϵ-almost-support for h. The image of a compact set is compact, so that part is taken care of. We still need to check that it's an ϵ-almost-support for g∗(h). Let f,f′ be equal on this set. Then |g∗(h)(f)−g∗(h)(f′)|=|h(f∘g)−h(f′∘g)|≤ϵd(f∘g,f′∘g) =ϵsupx|f(g(x)),f′(g(x))|≤ϵsupy|f(y),f′(y)|=ϵd(f,f′) And we're done. This is because, for any point x∈Cϵ, feeding it through g makes a point in g(Cϵ), and feeding it through f and f′ produces identical results because they agree on g(Cϵ). Therefore, f∘g and f′∘g agree on Cϵ and thus can have values only ϵd(f∘g,f′∘g) apart, which is actually upper-bounded by ϵd(f,f′). g(Cϵ) is thus a compact ϵ-almost-support for g∗(h), and this can be done for any ϵ, so g∗(h) has compact almost-support.
Since these three operations always produce infradistributions (as we've shown, we verified the last condition). Updating only has two properties to check, preserving homogenity when g=0 and cohomogenity when g=1, so let's get that knocked out.
Homogenity using homogenity for h (h|0L)(af)=h(af★L0)−h(0★L0)h(1★L0)−h(0★L0)=h(Laf)−h(0)h(1★L0)−h(0★L0)=ah(Lf)h(1★L0)−h(0★L0) =ah(Lf)h(1★L0)−h(0★L0)=ah(Lf)−h(0)h(1★L0)−h(0★L0)=ah(f★L0)−h(0★L0)h(1★L0)−h(0★L0)=a(h|0L)(f) Cohomogenity using cohomogenity for h (h|1L)(1+af)=h((1+af)★L1)−h(0★L1)h(1★L1)−h(0★L1)=h(L+aLf+1−L)−h(1−L)h(1)−h(1−L) =h(1aLf)−h(1−L)1−h(1−L)=(1−a+ah(1+Lf))−h(1−L)1−h(1−L) =1−a+ah(1+Lf)−h(1−L)+ah(1−L)−ah(1−L)1−h(1−L) =1−h(1−L)1−h(1−L)−a−ah(1−L)1−h(1−L)+ah(1+Lf)−ah(1−L)1−h(1−L) =1−h(1−L)1−h(1−L)−a1−h(1−L)1−h(1−L)+ah(1+Lf)−h(1−L)1−h(1−L) =1−a+ah(1+Lf)−h(1−L)h(1)−h(1−L)=1−a+ah(L+Lf+(1−L))−h(1−L)h(L+(1−L))−h(1−L) =1−a+ah((1+f)★L1)−h(0★L1)h(1★L1)−h(0★L1)=1−a+a(h|1L)(1+f)
Now for mixtures, we'll verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, and crispness.
Homogenity: (Eζhi)(af)=Eζ(hi(af))=Eζ(ahi(f))=aEζ(hi(f))=a(Eζhi)(f) 1-Lipschitz: |(Eζhi)(f)−(Eζhi)(f′)|=|Eζ(hi(f))−Eζ(hi(f′))| ≤Eζ|hi(f)−hi(f′)|≤Eζd(f,f′)=d(f,f′) Cohomogenity: (Eζhi)(1+af)=Eζ(hi(1+af))=Eζ(1−a+ahi(1+f)) =1−a+aEζ(hi(1+f))=1−a+a(Eζhi)(1+f) C-additivity: (Eζhi)(c)=Eζ(hi(c))=Eζ(c)=c Crispness: Observe that both homogenity and C-additivity are preserved, and crispness is equivalent to the conjunction of the two.
Now for deterministic pushforwards, we'll verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness.
Homogenity: (g∗(h))(af)=h((af)∘g)=h(a(f∘g))=ah(f∘g)=a(g∗(h))(f) 1-Lipschitzness: |(g∗(h))(f)−(g∗(h))(f′)|=|h(f∘g)−h(f′∘g)|≤d(f∘g,f′∘g)≤d(f,f′) Cohomogenity: (g∗(h))(1+af)=h((1+af)∘g)=h(1+a(f∘g))=1−a+ah(1+(f∘g)) =1−a+ah((1+f)∘g)=1−a+a(g∗(h))(1+f) C-additivity: (g∗(h))(c)=h(c∘g)=h(c)=c Crispness: Both homogenity and C-additivity are preserved, so crispness is preserved too.
Sharpness: (g∗(h))(f)=h(f∘g)=infx∈Cf(g(x))=infy∈g(C)f(y) And g(C) is the image of a compact set, so it's compact. And we're done!
Proposition 11:The inf of two infradistributions is always an infradistribution, and inf preserves the infradistribution properties indicated by the diagram at the start of this section.
We'll first verify the infradistribution properties of the inf, and then show it preserves the indicated properties if both components have them.
We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if f′≥f, then inf(h1,h2)(f′)=inf(h1(f′),h2(f′))≥inf(h1(f),h2(f))=inf(h1,h2)(f) This was done by monotonicity for the components. For concavity, inf(h1,h2)(pf+(1−p)f′)=inf(h1(pf+(1−p)f′),h2(pf+(1−p)f′)) ≥inf(ph1(f)+(1−p)h1(f′),ph2(f)+(1−p)h2(f′)) ≥inf(ph1(f),ph2(f))+inf((1−p)h1(f′),(1−p)h2(f′)) =pinf(h1(f),h2(f))+(1−p)inf(h1(f′),h2(f′)) =pinf(h1,h2)(f)+(1−p)inf(h1,h2)(f′) The first ≥ happened because h1 and h2 are concave, the second is because inf(a+b,c+d)≥inf(a,c)+inf(b,d).
For normalization, inf(h1,h2)(1)=inf(h1(1),h2(1))=inf(1,1)=1 And the same argument applies to 0, so the inf is normalized.
For Lipschitzness, the inf of two Lipschitz functions is Lipschitz.
That just leaves compact almost-support. Fix an arbitary ϵ, and get a C1ϵ compact ϵ-almost-support for h1, and a C2ϵ for h2. We will show that C1ϵ∪C2ϵ is a compact ϵ-almost-support for inf(h1,h2). It's compact because it's a finite union of compact sets.
Now, let f and f′ agree on C1ϵ∪C2ϵ. We can go: |inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))| There are four possible cases for evaluating this quantity. In case 1, h1(f)≤h2(f) and h1(f′)≤h2(f′). Then our above term turns into |h1(f)−h1(f′)|. However, since f and f′ agree on C1ϵ∪C2ϵ, they must agree on C1ϵ, and only have expectations ≤ϵd(f,f′) apart. Case 2 where h1(f)≥h2(f) and h1(f′)≥h2(f′) is symmetric and can be disposed of by a nearly identical argument, we just do it with h2 and C2ϵ.
Case 3 where h1(f)<h2(f) and h1(f′)>h2(f′) takes a slightly fancier argument. We can go: −ϵd(f,f′)<h1(f)−h1(f′)<h1(f)−h2(f′)<h2(f)−h2(f′)<ϵd(f,f′) The end inequalities are because f and f′ agree on the ϵ-almost-supports of h1 and h2, respectively, from agreeing on the union. The two inner inequalities are derived from the assumed inequalities in Case 3. Thus, |inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h2(f′)|<ϵd(f,f′) Case 4 where the assumed starting inequalities go in the other direction is symmetric. So, no matter which infradistributions are lower in the two infs, we have |inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|<ϵd(f,f′) And we're done, we made a compact almost-support for inf(h1,h2) assuming an arbitrary ϵ. So the inf of two infradistributions is a infradistribution.
Now to verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness preservation.
Homogenity: inf(h1,h2)(af)=inf(h1(af),h2(af))=inf(ah1(f),ah2(f)) =ainf(h1(f),h2(f))=ainf(h1,h2)(f) 1-Lipschitzness: |inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))| Now we can split into four cases. In cases 1 and 2 where the infs turn into h1(f),h1(f′) (and same for h2 in case 2), we have: |inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h1(f′)|≤d(f,f′) (and same for h2), and we're done with those cases. In cases 3 and 4 where the infs turn into h1(f),h2(f′) (and vice-versa for case 4), we have: −d(f,f′)≤h1(f)−h1(f′)<h1(f)−h2(f′)<h2(f)−h2(f′)≤d(f,f′) Because h1 and h2 are 1-Lipschitz. Thus, |inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h2(f′)|≤d(f,f′) A symmetric argument works for case 4. So, no matter what, |inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|≤d(f,f′) And we're done, the inf is 1-Lipschitz too.
Cohomogenity: inf(h1,h2)(1+af)=inf(h1(1+af),h2(1+af)) =inf(1−a+ah1(1+f),1−a+ah2(1+f)) =1−a+ainf(h1(1+f),h2(1+f)) =1−a+ainf(h1,h2)(1+f) C-additivity: inf(h1,h2)(c)=inf(h1(c),h2(c))=inf(c,c)=c Crispness: Homogenity and C-additivity are both preserved, so crispness is preserved.
Sharpness: inf(h1,h2)(f)=inf(h1(f),h2(f))=inf(infx∈C1f(x),infx∈C2f(x))=infx∈C1∪C2f(x) And we're done.
Proposition 13:If a family of infradistributions {hi}i∈I has a shared upper bound on the Lipschitz constant, and for all ϵ, there is a compact set Cϵ that is an ϵ-almost support for all hi, then infihi, defined as (infihi)(f):=infi(hi(f)), is an infradistribution. Further, for all conditions listed in the table, if all the hi fulfill them, then infihi fulfills the same property.
We'll first verify the infradistribution properties of the infinite inf, and then show it preserves the indicated properties if all components have them.
We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if f′≥f, then (infihi)(f′)=infi(hi(f′))≥infi(hi(f))=(infihi)(f) This was done by monotonicity for all components. For concavity, (infihi)(pf+(1−p)f′)=infi(hi(pf+(1−p)f′))≥infi(phi(f)+(1−p)hi(f′)) ≥infi(phi(f))+infi((1−p)hi(f′))=pinfi(hi(f))+(1−p)infi(hi(f′)) =p(infihi)(f)+(1−p)(infihi)(f′) The first ≥ happened because h1 and h2 are concave, the second is because infi(ai+bi)≥infi(ai)+infi(bi).
For normalization, (infihi)(1)=infi(hi(1))=infi(1)=1 And the same argument applies to 0, so the inf is normalized.
For Lipschitzness, let λ⊙ be your uniform upper bound on the Lipschitz constants of the hi. Then, |(infihi)(f)−(infihi)(f′)|=|infi(hi(f))−infi(hi(f′))| And then, for all the hi, they only think those functions differ by λ⊙d(f,f′) or less, and the same property applies to the inf by picking a hi and hi′ that very very nearly attain the two minimums, and showing that if the infinimums were >λ⊙d(f,f′) apart, you could have hi(f′) appreciably undershoot hi′(f′), and in fact, undershoot infi(hi(f′)), which is impossible. Thus, |infi(hi(f))−infi(hi(f′))|≤λ⊙d(f,f′) And we're done.
That just leaves compact almost-support. Fix an arbitary ϵ. We know there is some Cϵ that is a compact ϵ-almost-support for all the hi. We will show that Cϵ is an ϵ-almost-support for infihi.
Let f and f′ agree on Cϵ. We can go: |(infihi)(f)−(infihi)(f′)|=|infi(hi(f))−infi(hi(f′))| Pick a hi and hi′ that very very very nearly attain the inf. Then we can approximately reexpress this quantity as: |inf(hi(f),hi′(f))−inf(hi(f′),hi′(f′))| We're approximately in a case where hi(f)≤hi′(f) and hi(f′)≥hi′(f′), so we can go: −ϵd(f,f′)≤hi(f)−hi(f′)<hi(f)−hi′(f′)<hi′(f)−hi′(f′)≤ϵd(f,f′) The end inequalities are because f and f′ agree on the ϵ-almost-support of hi and hi′. The two inner inequalities are derived from the assumed inequalities in our case. Thus, |infi(hi(f))−infi(hi(f′))|≃|hi(f)−hi′(f′)|≤ϵd(f,f′) And we're done, we made a compact almost-support for infihi assuming an arbitrary ϵ. So the inf of this family of infradistributions is a infradistribution.
Now to verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness preservation.
Homogenity: (infihi)(af)=infi(hi(af))=infi(ahi(f)) =ainfi(hi(f))=a(infihi)(f) 1-Lipschitzness: Same as the Lipschitz argument, everyone has a Lipschitz constant of 1, so the inf has the same Lipschitz constant.
Cohomogenity: (infihi)(1+af)=infi(hi(1+af)) =infi(1−a+ahi(1+f)) =1−a+ainfi(hi(1+f)) =1−a+a(infihi)(1+f) C-additivity: (infihi)(c)infi(hi(c))=infi(c)=c Crispness: Homogenity and C-additivity are both preserved, so crispness is preserved.
Sharpness: (infihi)(f)=inf(hi(f))=infi(infx∈Cif(x))=infx∈⋃iCif(x)=infx∈¯¯¯¯¯¯¯¯¯¯¯⋃iCif(x) We do have to check whether or not ⋃iCi is compact, however. We'll start by showing that for an arbitrary Ci, any compact set K where Ci⊈K can't be an ϵ-support of hi for any ϵ<1. The proof proceeds as follows:
Let x∗ be some point in Ci but not in K. It must be some finite distance away from K. Craft a continuous function f0 supported on K∪{x∗}. f0 is 1 on K and 0 on {x∗}. Use the Tietze extension theorem to extend f0 to all of X. Then |hi(f0)−hi(1)|=|infx∈Cif0(x)−infx∈Ci1|=|f0(x∗)−1|=|0−1|=1 However, f0 and 1 agree on K, so K can't be an ϵ-almost-support for any ϵ<1.
Thus, in order for there to be a compact set Cϵ that's an ϵ-almost-support for allhi, it must be that ∀i:Ci⊆Cϵ. Then ¯¯¯¯¯¯¯¯¯¯¯¯⋃iCi⊆Cϵ because all the Ci are in it and Cϵ is closed. So, the closure of our union is a closed subset of a compact set and thus is compact, so infihi is minimizing over a compact set and thus is crisp.
Proposition 14:If sup(h1,h2)(0)=0 and sup(h1,h2)(1)=1, then the supremum is an infradistribution.
The supremum is defined as: sup(h1,h2)(f)=supf1,f2,p:pf1+(1−p)f2≤fph1(f2)+(1−p)h2(f2) We'll verify the infradistribution properties of the sup.
We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if f′≥f, then sup(h1,h2)(f)=supf1,f2,p:pf1+(1−p)f2≤fph1(f1)+(1−p)h2(f2) ≤supf1,f2,p:pf1+(1−p)f2≤f′ph1(f1)+(1−p)h2(f2)=sup(h1,h2)(f′) This was done by f′≥f so there's more options available. For concavity, qsup(h1,h2)(f)+(1−q)sup(h1,h2)(f′) =qsupf1,f2,p:pf1+(1−p)f2≤fph1(f1)+(1−p)h2(f2) +(1−q)supf′1,f′2,p′:p′f′1+(1−p′)f′2≤fp′h1(f′1)+(1−p′)h2(f′2) Pick your f1,f2,f′1,f′2,p,p′ that very very very nearly attain the supremum. ≃qph1(f1)+q(1−p)h2(f2)+(1−q)p′h1(f′1)+(1−q)(1−p′)h2(f′2) =(qp+(1−q)p′)(qpqp+(1−q)p′h1(f1)+(1−q)p′qp+(1−q)p′h1(f′1)) +(q(1−p)+(1−q)(1−p′))(q(1−p)q(1−p)+(1−q)(1−p′)h2(f2)+(1−q)(1−p′)q(1−p)+(1−q)(1−p′)h2(f′2)) ≤(qp+(1−q)p′)h1(qpqp+(1−q)p′f1+(1−q)p′qp+(1−q)p′f′1) +(q(1−p)+(1−q)(1−p′))h2(q(1−p)q(1−p)+(1−q)(1−p′)f2+(1−q)(1−p′)q(1−p)+(1−q)(1−p′)f′2) Also, we can verify that: (qp+(1−q)p′)(qpqp+(1−q)p′f1+(1−q)p′qp+(1−q)p′f′1) +(q(1−p)+(1−q)(1−p′))(q(1−p)q(1−p)+(1−q)(1−p′)f2+(1−q)(1−p′)q(1−p)+(1−q)(1−p′)f′2) =qpf1+(1−q)p′f′1+q(1−p)f2+(1−q)(1−p′)f′2 =q(pf1+(1−p)f2)+(1−q)(p′f′1+(1−p′)f′2)≤qf+(1−q)f′ Therefore, it is a suitable parameter and pair of functions to lower bound qf+(1−q)f′. Accordingly (qp+(1−q)p′)h1(qpqp+(1−q)p′f1+(1−q)p′qp+(1−q)p′f′1) +(q(1−p)+(1−q)(1−p′))h2(q(1−p)q(1−p)+(1−q)(1−p′)f2+(1−q)(1−p′)q(1−p)+(1−q)(1−p′)f′2) ≤supp∗,f∗1,f∗2:p∗f∗1+(1−p∗)f∗2≤qf+(1−q)f′p∗h1(f∗1)+(1−p∗)h2(f∗2)=sup(h1,h2)(qf+(1−q)f′) Putting all this together, and picking better and better approximations to the two suprema, we can conclude that: qsup(h1,h2)(f)+(1−q)sup(h1,h2)(f′)≤sup(h1,h2)(qf+(1−q)f′) And we have concavity.
For normalization, we're assuming it holds at the start.
Lipschitzness takes a slightly more involved argument. Pick two functions f and f′, and without loss of generality, assume sup(h1,h2)(f)≥sup(h1,h2)(f′). Now, what we can do is pick a p, f1 and f2 which approximately obtain the defining supremum for f, so we have: sup(h1,h)2)(f)≃ph1(f1)+(1−p)h2(f2) Now, we can note two things. First, p(f1−d(f,f′))+(1−p)(f2−d(f,f′))=(pf1+(1−p)f2)−d(f,f′)≤f−d(f,f′)≤f′ Therefore, the same p, and f1−d(f,f′), and f2−d(f,f′) are suitable things to lower-bound the value of sup(h1,h2)(f′). In particular, we have: sup(h1,h2)(f′)=supq,f′1,f′2:qf′1+(1−q)f′2≤f′qh1(f′1)+(1−q)h2(f′2) ≥ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′)) Also, we have the result that: ph1(f1)+(1−p)h2(f2)−ph1(f1−d(f,f′))−(1−p)h2(f2−d(f,f′)) =|(ph1(f1)+(1−p)h2(f2))−(ph1(f1−d(f,f′))−(1−p)h2(f2−d(f,f′)))| ≤|ph1(f1)−ph1(f1−d(f,f′))|+|(1−p)h2(f2)−(1−p)h2(f2−d(f,f′))| =p|h1(f1)−h1(f1−d(f,f′))|+(1−p)|h2(f2)−h2(f2−d(f,f′))| ≤p(λ⊙1⋅d(f,f′))+(1−p)(λ⊙2⋅d(f,f′))≤max(λ⊙1,λ⊙2)d(f,f′) Because of Lipschitzness of h1 and h2. Now we can begin showing our inequalities. So, we've shown that: sup(h1,h2)(f′)≥ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′)) Therefore, ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′))−sup(h1,h2)(f′)≤0 With this result, we can go: max(λ⊙1,λ⊙2)d(f,f′) ≥max(λ⊙1,λ⊙2)d(f,f′)+ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′))−sup(h1,h2)(f′) Let's save this result for a bit later. Also, we had: ph1(f1)+(1−p)h2(f2)−ph1(f1−d(f,f′))−(1−p)h2(f2−d(f,f′))≤max(λ⊙1,λ⊙2)d(f,f′) And we also picked p and f1 and f2 to approximately attain the supremum, so we know: ph1(f1)+(1−p)h2(f2)≃sup(h1,h2)(f) Therefore, we approximately have: sup(h1,h2)(f)−ph1(f1−d(f,f′))−(1−p)h2(f2−d(f,f′))≤max(λ⊙1,λ⊙2)d(f,f′) Reshuffling this around a bit, we have: max(λ⊙1,λ⊙2)d(f,f′)+ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′))≥sup(h1,h2)(f) Using this with our saved result, we can get: max(λ⊙1,λ⊙2)d(f,f′) ≥max(λ⊙1,λ⊙2)d(f,f′)+ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′))−sup(h1,h2)(f′) ≥sup(h1,h2)(f)−sup(h1,h2)(f′)≥0 That last inequality was because we assumed at the start without loss of generality that f got an equal or higher expectation than f′. Therefore, we have our result that, in general, |sup(h1,h2)(f)−sup(h1,h2)(f′)|≤max(λ⊙1,λ⊙2)d(f,f′) And thus, the supremum of two Lipschitz infradistributions is Lipschitz. That just leaves compact almost-support, which is quite tricky to show.
Fix an arbitary ϵ, and get a C1ϵ compact ϵ-almost-support for h1, and a C2ϵ for h2. We will show that C1ϵ∪C2ϵ is a compact ϵ-almost-support for sup(h1,h2). It's compact because it's a finite union of compact sets.
Now, let f and f′ agree on C1ϵ∪C2ϵ. Without loss of generality, assume that sup(h1,h2)(f)≥sup(h1,h2)(f′) (if not, flip f and f′). We'll show that they have similar expectations by showing that sup(h1,h2)(f)−sup(h1,h2)(f′) is below a small number (we already know that it's above 0 by our without-loss-of-generality assumption).
We can go: sup(h1,h2)(f)=supp,f1,f2:pf1+(1−p)f2≤fph1(f1)+(1−p)h2(f2)≃ph1(f1)+(1−p)h2(f2) Where we picked a particular p,f1,f2 spectacularly close to the highest possible value s.t. pf1+(1−p)f2≤f. In particular, if p is 0 or 1, we can ensure that f1 or f2 is f itself, by monotonicity of h1 or h2 respectively.
For successive arguments, we need p∈(0,1) so we have to address those endpoints. Assume p=1. Then, sup(h1,h2)(f)≃h1(f). Then, we have: sup(h1,h2)(f)−sup(h1,h2)(f′) ≃h1(f)−sup(h1,h2)(f′) ≤h1(f′)+ϵd(f,f′)−sup(h1,h2)(f′)≤ϵd(f,f′) The way this works is our substitution, and then using that f and f′ are identical on C1ϵ∪C2ϵ, and so are identical on C1ϵ, which is ϵ-almost-support of h1, we can upper-bound h1(f) with h1(f′)+ϵd(f,f′). And then, we just use that h1(f′)≤sup(h1,h2)(f′). If p=0, the exact same argument works, just with h2 and C2ϵ instead. That leaves the case where p∈(0,1), which requires far more involved arguments.
As a recap, we're assuming that sup(h1,h2)(f)≥sup(h1,h2)(f′), and that sup(h1,h2)(f)≃ph1(f1)+(1−p)h2(f2), and p∈(0,1). Now, we're going to pick out a continuous function with some special properties, so let the set-valued function ψ:X→R2 be defined as: If x∈C1ϵ∪C2ϵ, then ψ(x)=(f1(x),f2(x)). Otherwise, ψ(x) equals the intersection of: [f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)] and {(y,z)|py+(1−p)z≤f′(x)} We'll find a continuous selection of this set-valued function, so let's start checking the properties needed to invoke the Michael selection theorem. We need that X is paracompact (all polish spaces are paracompact, check), that R2 is a Banach space (check), that for all x, ψ(x) is convex (it's either a single point or the intersection of a rectangle and a half-space, which is convex in both cases), closed (yup, it's either a point or the intersection of two closed sets, ie closed), nonempty, and lower-hemicontinuous.
Nonemptiness isn't too bad to show. It's nonempty for all points in our compact set of interest (the set consisting of a single point), and for x not in said set, (f1(x)−d(f,f′),f2(x)−d(f,f′)) witnesses the nonemptiness, because: p(f1(x)−d(f,f′))+(1−p)(f2(x)−d(f,f′))=pf1(x)+(1−p)f2(x)−d(f,f′) ≤f(x)−d(f,f′)≤f′(x) Lower-hemicontinuity is much more challenging to establish. Again, we have a sequence xn limiting to x, a point (y,z)∈ψ(x), and we must find a subsequence (ym,zm)∈ψ(xm) which limits to (y,z).
We can divide into three cases. In the first case, x lies in C1ϵ∪C2ϵ, and infinitely many members of the sequence lie in said set. In particular, since x lies in the compact set, the (y,z) pair associated with it must be (f1(x),f2(x)). Then we can isolate that particular subsequence that lies in the compact set, and have (ym,zm) be (f1(xm),f2(xm)), which, by continuity of f1 and f2, and the definition of ψ(xm) for xm in the compact set, lie in ψ(xm) and limit to (y,z) ie (f1(x),f2(x)).
In preparation for the second and third cases, we'll show that the function ψ′:X→R2 which just takes the second branch of the ψ function is continuous w.r.t. the Hausdorff-metric. Ie, for all x, ψ′(x):=[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)] ∩{(y,z)|py+(1−p)z≤f′(x)} is continuous when the space of compact subsets of R2 is equipped with a Hausdorff distance.
Accordingly, let xm limit to x. Our task is to show that, no matter how tiny of a number you name, you can find a tail of the xm sequence where the Hausdorff distance between ψ′(xm) and ψ′(x) is that tiny.
Specifically, we'll show that for all δ, there is some m where all later xm have ψ′(xm) within 2δp+2δ1−p+4δ Hasdorff distance of ψ′(x). Because p∈(0,1) and we can shrink δ to 0, this shows that the function ψ′ is continuous in Hausdorff-distance.
Because f1 and f2 and f′ are continuous functions, there's some very very large m where f1, f2, and f′ will only vary by δ from that point forward, regardless of which δ you pick. Pick some arbitrary (ym,zm)∈ψ′(xm). We'll show that it's close to a (y,z)∈ψ′(x), and the argument will only depend on distances, not position in sequence, so we can flip it to show the other half of Hausdorff-distance (all points in ψ′(x) are close to a point in ψ′(xm)).
We can divide into four possible cases. In cases 1 and 2, we have the following property holding. ym≥f1(xm)−d(f,f′)+2δp+δ With the negation for cases 3 and 4.
And in cases 1 and 3, we have: zm≥f2(xm)−d(f,f′)+2δ1−p+δ With the negation for cases 2 and 4.
In cases 1 and 2, you can let your selected y point be ym−2δp. We have the result that y∈[f1(x)−d(f,f′),f1(x)+d(f,f′)], because: f1(x)+d(f,f′)≥f1(xm)−δ+d(f,f′)≥ym−δ≥ym−2δp=y ≥f1(xm)−d(f,f′)+2δp+δ−2δp=f1(xm)−d(f,f′)+δ≥f1(x)−d(f,f′) In order, the first inequality is because f1 only varies by δ over such tiny distances due to continuity of f1, the second inequality is ym being paired with something to be in ψ′(xm) so it has a known upper bound on its value, then the third inequality is because p<1, the equality is our definition of our y, then for the next inequality using the fact that we're assuming that ym has a particular lower bound since we're in cases 1 and 2, Then there's just a cancellation, and f1 only varying by δ over such tiny distances.
You can use nearly identical arguments in cases 1 and 3 to get that, when you define z to be zm−2δ1−p. you have the result that z∈[f2(x)−d(f,f′),f2(x)+d(f,f′)]
Now, in cases 3 and 4, we can let y be: f1(x)−d(f,f′), and then we have: −δ=f1(x)−δ−d(f,f′)−(f1(x)−d(f,f′)) =f1(x)−δ−d(f,f′)−y≤f1(xm)−d(f,f′)−y≤ym−y ≤f1(xm)−d(f,f′)+2δp+δ−y≤f1(x)+δ−d(f,f′)+2δp+δ−y =f1(x)−d(f,f′)+2δp+2δ−y=f1(x)−d(f,f′)+2δp+2δ−(f1(x)−d(f,f′)) =2δp+2δ The first equality is just pair-creation, then the second one is packing up the definition of y. The first inequality is because f1 only varies by δ over that distance, the second inequality is because ym∈ψ′(xm) so it's got the usual lower bound, then the next inequality after that is because we're in cases 3 and 4 so ym<f1(xm)−d(f,f′)+2δp+δ Then, it's just another "f1 doesn't change much over the tiny distance", moving the δ's together, unpacking y, and cancelling out. The net result is that we have: |ym−y|≤2δp+2δ You can use nearly identical arguments in cases 2 and 4 to get that, when you define z to be be f2(x)−d(f,f′) you have the result that |zm−z|≤2δ1−p+2δ.
At this point, we can resume our progress on the four cases and go "ok, in case 1, we have..." ym≥f1(xm)−d(f,f′)+2δp+δ zm≥f2(xm)−d(f,f′)+2δ1−p+δ And we know that those properties lead to y being defined as ym−2δp and z being defined as zm−2δp. And we know that in that case, (y,z)∈[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)] So, all we have to check is that py+(1−p)z≤f′(x) in order to conclude that (y,z)∈ψ′(x). Let's do that. py+(1−p)z=p(ym−2δp)+(1−p)(zm−2δ1−p)=pym+(1−p)zm−4δ ≤f′(xm)−4δ≤f′(x)+δ−3δ=f′(x)−2δ<f′(x) And we have that (y,z)∈ψ′(x), accordingly. The first equality was unpacking definitions, then the second was some cancellation, and then the first inequality was because (ym,zm)∈ψ′(xm) by assumption so we have pym+(1−p)zm≤f′(xm). The second inequality was because f′ doesn't change much over such tiny distances and then it's just trivial cleanup.
Thus, when we picked a point (ym,zm)∈ψ′(xm) where xm is sufficiently close to x, and we're in case 1, we have that there are points (y,z)∈ψ′(x), and d((ym,zm),(y,z))=|ym−y|+|zm−z|=∣∣ym−ym+2δp∣∣+∣∣zm−zm+2δ(1−p)∣∣ =2δp+2δ(1−p) This is from the definitions of y and z in Case 1.
Now, let's address case 2, where ym≥f1(xm)−d(f,f′)+2δp+δ zm<f2(xm)−d(f,f′)+2δ1−p+δ In this case, y is defined as ym−2δp, and z is defined as f2(x)−d(f,f′) And we know that in that case, (y,z)∈[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)] (the first part on the y is the same argument from case 1, the second interval is from the value of z) So, all we have to check is that py+(1−p)z≤f′(x) in order to conclude that (y,z)∈ψ′(x). We know that −δ≤zm−z So we can flip this a bit to get z≤zm+δ Accordingly, from that, we get: py+(1−p)z≤p(ym−2δp)+(1−p)(zm+δ)=pym+(1−p)zm−2δ+(1−p)δ ≤f′(xm)−δ≤f′(x) And we have that (y,z)∈ψ′(x), accordingly.The first inequality was definition unpacking and the inequality we just got, then the first equality is just breaking things up a bit, then the second inequality is just observing that 1−p≤1, and then f′ doesn't change much over such tiny distances.
Thus, when we picked a point (ym,zm)∈ψ′(xm) where xm is sufficiently close to x, and we're in case 2, we have that there are points (y,z)∈ψ′(x), and d((ym,zm),(y,z))=|ym−y|+|zm−z| ≤∣∣ym−ym+2δp∣∣+2δ1−p+2δ=2δp+2δ(1−p)+2δ This is from the definitions of y and z in Case 2, and the fact that in case 2 we can derive |zm−z|≤2δ1−p+2δ
Extremely similar arguments to case 2 dispatch case 3 with a resolution of the corresponding (y,z) lying in ψ′(x) and d((ym,zm),(y,z))≤2δp+2δ(1−p)+2δ
Finally, for case 4, we have: ym<f1(xm)−d(f,f′)+2δp+δ zm<f2(xm)−d(f,f′)+2δ1−p+δ In this case, y is defined as f1(x)−d(f,f′), and z is defined as f2(x)−d(f,f′) Trivially, we have: (y,z)∈[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)] So, all we have to check is that py+(1−p)z≤f′(x) in order to conclude that (y,z)∈ψ′(x). To do this, we have: We know that −δ≤zm−z So we can flip this a bit to get z≤zm+δ Accordingly, from that, we get: py+(1−p)z=p(f1(x)−d(f,f′)+(1−p)(f2(x)−d(f,f′)) =pf1(x)+(1−p)f2(x)−d(f,f′)≤f(x)−d(f,f′)≤f′(x) (because pf1+(1−p)f2≤f) And we have that (y,z)∈ψ′(x), accordingly.
Thus, when we picked a point (ym,zm)∈ψ′(xm) where xm is sufficiently close to x, and we're in case 4, we have that there are points (y,z)∈ψ′(x), and d((ym,zm),(y,z))=|ym−y|+|zm−z|≤2δ1−p+2δ+2δ1−p+2δ=2δp+2δ(1−p)+4δ This is from the definitions of y and z in Case 4, and the fact that in case 4 we can derive |ym−y|≤2δ1−p+2δ (and same for zm)
These 4 cases were exhaustive, so we now know that, given any x and sequence of points xm limiting to x, and any δ, there is a tail of sufficiently large m's where the distance from any point in ψ′(xm) to ψ′(x) is 2δp+2δ(1−p)+4δ or less. We can also flip xm and x and use our four cases (our argument is symmetric) to show that actually, this is a bound on the Hausdorff distance between ψ′(xm) and ψ′(x). δ was arbitrary, as was the sequence xm and the x, so this means that ψ′ is continuous in Hausdorff-distance.
Ok, we're a bit in the weeds here, how does that help? Well, we were trying to verify the compact almost-support property for the supremum. This requires, as part of it, getting a continuous function with some special properties. We're going to apply a selection function to get it, but we could only take care of the prerequisites that aren't lower-hemicontinuity. And to show lower-Hemicontinuity in general, we needed to take this detour through showing that the modified set-valued function is continuous in Hausdorff-distance. So let's pop back up the stack.
One level back up the stack, we were trying to show lower-Hemicontinuity. It is the property that given any sequence xn which limits to x, and any (y,z)∈ψ(x), there is some subsequence xm and (ym,zm)∈ψ(xm) where (ym,zm) limits to ψ(xm). We dispatched the case where infinitely many elements of the sequence were in our C1ϵ∪C2ϵ, leaving two cases. There's the case where only finitely elements of that sequence are in that compact set, but the limit point x lies in that set. There's also the case where the limit point x doesn't lie in that set.
Dealing with case 3, we have a sequence xn heading to x. Strip off all the xn that lie in the compact set, making your xm. And let (ym,zm) be whichever point in ψ(xm) is closest to (y,z). Now, by how they were defined, ψ(xm)=ψ′(xm), and ψ′ is continuous in Hausdorff-distance, so "take the closest point" is definitely going to get you the convergence you seek to your arbitrarily selected (y,z)∈ψ(x) point.
For case 2, where we're limiting to x from outside the compact set, all we need to show is that ψ(x)⊆ψ′(x) (we don't necessarily have equality because ψ and ψ′ start being different on that compact set), in order to get a sequence (ym,zm) converging to the (y,z)∈ψ(x) point. So, let's do this. Because x lies in C1ϵ∪C2ϵ, we have that ψ(x)=(f1(x),f2(x)). The conditions for (y,z) to be in ψ′(x) are: (y,z)∈[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)] Which is obviously true for f1(x),f2(x), and: py+(1−p)z≤f′(x) Which is the case because: pf1(x)+(1−p)f2(x)≤f(x)=f′(x) By how f was made, and f′=f on that compact set.
Thus, we're done, we verified lower-hemicontinuity for ψ in all the cases, so we can invoke the Michael selection theorem and get a continuous selection f∗:X→R2 with three valuable properties. Let's abbreviate pr1(f∗(x)) as f′1, for notational convenience. It's projecting it to the first coordinate. f′2 is defined similarly.
Our first notable property is: x∈C1ϵ∪C2ϵ→f′1(x)=f1(x)∧f′2(x)=f2(x) (ie, projecting f∗ down to the two coordinates makes functions which perfectly mimic f1 and f2 on the compact set of interest)
Our second one is: d(f1,f′1)≤d(f,f′) And the same for d(f2,f′2).
And our third notable property is that: pf′1+(1−p)f′2≤f′ But why do these properties hold of our selection function? Well, when x lies in that compact set, ψ(x)=(f1(x),f2(x)), so our selection function is forced to have its projections mimic f1 and f2 on said compact set, taking care of the first one.
For our second property, we have: f∗(x)∈ψ(x)⊆[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)] Accordingly, we know that the projections to the two coordinates can't be too far away from f1and f2 respectively.
For our third property, we have: f∗(x)∈ψ(x)⊆{(y,z)|py+(1−p)z≤f′(x)} Accordingly, the projections to the two coordinates can't mix to exceed the function f′.
So, where to from here? Well, we have: |sup(h1,h2)(f)−sup(h1,h2)(f′)|=sup(h1,h2)(f)−sup(h1,h2)(f′) ≃ph1(f1)+(1−p)h2(f2)−sup(h1,h2)(f′) ≤p(h1(f′1)+ϵd(f,f′))+(1−p)h2(f′2)+ϵd(f,f′))−sup(h1,h2)(f′) =ph1(f′1)+(1−p)h2(f′2)−sup(h1,h2)(f′)+ϵd(f,f′)≤ϵd(f,f′) Here's why. We assumed at the very start that without loss of generality, we'd take f to be the one with higher expectation value. We found a p,f1,f2 that nearly replicated the expectation value of sup(h1,h2)(f). f′1 copies f1 on a compact almost-support of h1, namely C1ϵ, and we also have d(f′1,f1)≤d(f′,f), and similar for f′2 and f2. And finally, since pf′1+(1−p)f′2≤f′, that mix must have lower value than sup(h1,h2)(f′). And we're done! f and f′ were arbitrary except that they agreed on C1ϵ∪C2ϵ, a compact set, and we got: sup(h1,h2)(f)−sup(h1,h2)(f′)≤ϵd(f,f′) Witnessing that said set is a compact ϵ-almost-support. ϵ was arbitrary, so sup(h1,h2) is compactly-almost-supported. This is the last condition needed to check to see that it's an infradistribution.
Proposition 15:All three characterizations of the supremum given in Definition 13 are identical.
So, the first characterization we gave was: sup(h1,h2)(f):=supp,f1,f2:pf1+(1−p)f2≤fph1(f1)+(1−p)h2(f2) And the second characterization was the least infradistribution greater than h1,h2 in the information ordering.
And the third characterization was as the concave monotone hull of f↦sup(h1(f),h2(f)).
We will use sup1,sup2,sup3 for these three characterizations of the supremum of two infradistributions and show that they are equal.
Let's begin showing this. sup2(h1,h2)(f)≥supζ∈ΔN,fi∈∏iCB(X):Eζfi≤f(sup2(h1,h2)(Eζ(fi))) This occurs by monotonicity, any mix of functions which undershoots f must get a lower score because sup2(h1,h2) is an infradistribution. ≥supζ∈ΔN,fi∈∏iCB(X):Eζfi≤fEζ(sup2(h1,h2)(f)) This is because of convexity of sup2(h1,h2), since it's an infradistribution. The value of the mix is as good or better than the mix of the values. ≥supζ∈ΔN,fi∈∏iCB(X):Eζfi≤fEζ(sup(h1(fi),h2(fi)))=sup3(h1,h2)(f) This is because sup2(h1,h2)≥h1 (and same for h2), so making that swap decreases the value. Also, this quantity is the concave monotone hull of the supremum of h1,h2. Why? Well, sup(h1(f),h2(f)) is our first attempt at assessing the value of a function f. However, it isn't necessarily monotone. So, supf∗≤fsup(h1(f∗),h2(f∗)) is the monotone hull, we're saying that if there's a value below you that outscores you, then you should update the value of f to be big enough. And then, to get the concave monotone hull, we replace the lower bound on f with a countable/arbitrary finite mix of functions because any concave function should have the value of the mix be ≥ the mix of the values, so we have to bump the value of f∗ up to at least the mix of the values to not violate concavity. Anyways, now that we know this is sup3(h1,h2)(f), we can go further to: ≥supp,f1,f2:pf1+(1−p)f2≤f(psup(h1(f1),h2(f1))+(1−p)sup(h1(f2),h2(f2))) This is lower because now we're specializing to only certain sorts of probability distributions over N, those that are only supported on the first two values, so it's harder to attain suprema. And now, ≥supp,f1,f2:pf1+(1−p)2≤f(ph1(f1)+(1−p)h2(f2))=sup1(h1,h2)(f) We swapped out the supremum for a specific term in it in order to do this, and used our given definition of sup1. And then we can specialize p to 1 and f1 to f itself, to get ≥h1(f) Similarly, we could specialize to p=0 and f2=f to get ≥h2(f). So taking stock of what we have, sup2(h1,h2)(f)≥sup3(h1,h2)(f)≥sup1(h1,h2)(f)≥sup(h1(f),h2(f)) For all functions, so: sup2(h1,h2)≥sup3(h1,h2)≥sup1(h1,h2)≥h1 (and same for h2) We recall that in Proposition 14 we proved that sup1 always makes an infradistribution. Since sup1 is above both component infradistributions, and sup2 was defined as the least infradistribution that is above h1 and h2, we must have equality, and sup2(h1,h2)=sup3(h1,h2)=sup1(h1,h2)≥h1 (and same for h2) And we've shown the three definitions of the supremum are identical.
To recap, sup(H1,H2):=H1∩H2 Now, sup(H1,H2) can be turned into a concave monotone functional CB(X)→R, by LF-duality. Further, it's convex, closed, and upper-complete due to being the intersection of two convex closed upper-complete sets. Let's use h to refer to its corresponding functional. Then: h(f)=Esup(H1,H2)(f)=inf(m,b)∈H1∩H2m(f)+b ≥inf(m,b)∈H1m(f)+b=EH1(f)=h1(f) And the same applies to H2, and this applies to all functions, so h≥h1 (and same for h2).
We know from Proposition 15 that the least concave monotone functional above h1 and h2 is sup(h1,h2), so h≥sup(h1,h2)≥h1 (and same for h2) Call the corresponding set of sup(h1,h2) as Hsup. Thus, translating this information ordering back to sets, sup(H1,H2)⊆Hsup⊆H1 And same for H2. Therefore. sup(H1,H2)⊆Hsup⊆H1∩H2=sup(H1,H2) Therefore, all the subsets must be actual equalities, and so in particular we have: Hsup=sup(H1,H2) Then we can go: Esup(H1,H2)(f)=sup(h1,h2)(f) =supp,f1,f2(ph1(f1)+(1−p)h2(f2))=supp,f1,f2:pf1+(1−p)f2≤f(pEH1(f1)+(1−p)EH2(f2)) By sup(H1,H2) being equivalent to the infradistribution set induced by sup(h1,h2), expanding our definition of the sup, and translating back. And we're done!
Proposition 17:For any property in the table at the start of this section, sup(h1,h2) will fulfill the property if both components fulfill the property.
The way to show this is to use the alternate characterizations of supremum as intersection of the infradistribution sets, and the alternate characterizations of the various properties in terms of properties of minimal points.
We will make an observation used in all further proofs of properties. In order for H1 to have (λμ,b) in it, there must be a minimal point of the form (λμ,b1) with b1≤b below it. Similarly, for H2 to contain (λμ,b), there must be a minimal point of the form (λμ,b2) below it, with b2≤b.
Thus, for (λμ,b) to lie in (H1∩H2)min, (λμ,b1)∈Hmin1 and (λμ,b2)∈Hmin2 and b=sup(b1,b2). Part of this is because said point lies in H1 and H2, the other part is because (λμ,sup(b1,b2)) is the lowest possible point in H1∩H2 associated with a measure component of λμ, and it's the minimal. This observation will be used for all future sub-proofs in this proposition.
Homogenity: This is equivalent to "all minimal points have b=0", so if (λμ,b)∈(H1∩H2)min, then (λμ,0)∈Hmin1 (homogenity for H1), and same for Hmin2, so b=sup(0,0)=0.
1-Lipschitzness: This is equivalent to "all minimal points have λ≤1", so if (λμ,b)∈(H1∩H2)min, then (λμ,b1)∈Hmin1, and λ≤1 (1-Lipschitzness of H1), so λ≤1.
Cohomogenity: This is equivalent to "all minimal points have λ+b=1", so if (λμ,b)∈(H1∩H2)min, then (λμ,b1)∈Hmin1, and λ+b1=1 (cohomogenity of H1), and (λμ,b2)∈Hmin2, and λ+b2=1, so b1=b2. Then, λ+b=λ+sup(b1,b2)=λ+b1=1.
C-additivity: This is equivalent to "all minimal points have λ=1", so if (λμ,b)∈(H1∩H2)min, then (λμ,b1)∈Hmin1, and λ=1 (C-additivity of H1), so λ=1.
Crispness: This is equivalent to the conjunction of homogenity and C-additivity, both of which are preserved, so crispness is preserved as well.
Sharpness: Because all sharp infradistributions are crisp, (H1∩H2)min must be composed entirely of probability distributions if H1 and H2 are sharp. If any of the probability distributions in (H1∩H2)min aren't supported on C1 (the compact set associated with the sharp infradistribution H1), then they aren't in H1, which is impossible. Symmetric arguments apply to H2. Thus, (H1∩H2)min only has probability distributions supported on C1∩C2. If there was any probability distribution supported on that set that was missing from (H1∩H2)min, then it'd be present in H1 and H2, and thus present in H1∩H2, and minimal, so we have a contradiction. Therefore (H1∩H2)min consists of all probability distributions supported on C1∩C2 which is a compact set, so the supremum is sharp as well.
Proposition 18:If a family of infradistributions hi is directifiable, then supihi (defined as the functional corresponding to the set ⋂iHi) exists and is an infradistribution. Further, for all conditions listed in the table, if all the hi fulfill them, then supihi fulfills the same property.
A family of infradistributions being directifiable is equivalent to "for any collection of finitely many infradistributions, the supremum exists". We also know that the supremum is exactly equivalent to set intersection. So, we'll show that directifiability (any collection of finitely many infradistributions has a supremum) implies that the intersection of all the infradistribution sets has the exact properties of a set-form infradistribution.
We have six properties to check. Nonemptiness, normalization (the existence of a point (λμ,0), existence of a point (λμ,b) with λ+b=1 and nonexistence of points with λ+b<1), closure, convexity, upper-completion, and compact-projection (the measure components of the infradistribution are contained in a compact set of measures).
For closure, it's the intersection of closed sets, so it's closed. For convexity, it's the intersection of convex sets, so it's convex. For upper-completion, it's the intersection of upper-complete sets, so it's upper-complete. For compact-projection, the measure components of the countable intersection are contained within the countable intersection of the sets of measure components, which is contained in a compact set, so it fulfills that property too.
This just leaves nonemptiness and normalization. We'll show normalization, which automatically implies nonemptiness. The nonexistence of points with λ+b<1 is definitely not preserved under intersection.
However, the compact-projection property means that for any infradistribution set Hi, the intersection of it with the surface of a-measures where λ+b=1 is compact, so we're intersecting a bunch of compact sets. Due to the existence of supremum infradistributions for each collection of finitely many infradistributions (directifiability), we have the nonempty finite intersection property needed to conclude that the intersection of compact sets is nonempty. The same argument applies to the existence of a point with b=0. The presence of those two points witnesses nonemptiness and normalization.
These are the last two conditions we needed to conclude the set represents an infradistribution, so the infinite supremum exists and is the infradistribution we need.
For preservation of the various properties, we can just reuse the arguments from Proposition 17 with only trivial modifications.
Proposition 10: Mixture, updating, and continuous pushforward preserve the properties indicated by the diagram, and always produce an infradistribution.
We'll start with showing that mixture, updating, and continuous pushfoward are always infradistributions, and then turn to property verification.
We know from the last post that mixture, updating, and continuous pushfoward preserve all infradistribution properties (although you need to be careful about whether mixture preserves Lipschitzness, you need that the expected value of the Lipschitz constant is finite), but we added the new one about compact almost-support, so that's the only part we need to re-verify.
To show that mixture has compact almost-support, remember that
(Eζhi)(f)=Eζ(hi(f))
Now, fix an ϵ, we will craft a compact set that accounts for all but ϵ of why functions have the expectation values they do. There is some n where ∑i>nζiλ⊙i<ϵ2, where λ⊙i is the Lipschitz constant of the infradistribution hi. Then, let Cϵ be ⋃i≤nCi,ϵ2, the union of the compact ϵ2-almost-supports for the infradistributions hi,i≤n. This is a finite union of compact sets, so it's compact.
Now we can go:
|(Eζhi)(f)−(Eζhi)(f)|=|Eζ(hi(f))−Eζ(hi(f′))|≤∑iζi|hi(f)−hi(f′)|
=∑i≤nζi|hi(f)−hi(f′)|+∑i>nζi|hi(f)−hi(f′)|≤∑i≤nζiϵ2d(f,f′)+∑i>nζiλ⊙id(f,f′)
=d(f,f′)(ϵ2∑i≤nζi+∑i>nζiλ⊙i)<d(f,f′)(ϵ2+ϵ2)=ϵd(f,f′)
The first equality is reexpressing mixtures, and the first inequality is moving the expectation outside the absolute value which doesn't decrease value, then we break up the expectation for the second equality. The second inequality is because the gap between hi(f) and hi(f′) has a trivial upper bound from the Lipschitzness of hi, and for i≤n, we have that f and f′ agree on the union of the ϵ2-almost-supports for the hi,i≤n, so a particular infradistribution, by the definition of an almost-support, has these two expectations having not-very-different values. Then we just pull the gap between f and f′ out, and use the fact that for the mixture to work, ∑iζiλ⊙i<∞, and we picked n big enough for that last tail of the infinite sum to be small. Then we're done.
Now, we will show compact almost-support for h|gL assuming h has compact almost-support. Fix an ϵ. Your relevant set for support(L) will be
CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙}
Where the first term is a compact set that is a ϵPgh(L)2-almost-support for h, and that last set is a sort of "this point must be likely enough". λ⊙ will be the Lipschitz constant of the original h. Yes, this intersection may be empty.
Now, here's how things go. Let f and f′ agree on that intersection. (if it's the empty set, then it can be any two functions). We can go:
|(h|gL)(f)−(h|gL)(f′)|=∣∣∣h(f★Lg)−h(0★Lg)h(1★Lg)−h(0★Lg)−h(f′★Lg)−h(0★Lg)h(1★Lg)−h(0★Lg)∣∣∣
=1h(1★Lg)−h(0★Lg)|h(f★Lg)−h(0★Lg)−h(f′★Lg)+h(0★Lg)|
=1Pgh(L)|h(f★Lg)−h(f′★Lg)|=1Pgh(L)|h(Lf+(1−L)g)−h(Lf′+(1−L)g)|
So far, this is just a standard sequence of rewrites. The definition of the update, pulling the fraction out, using Pgh(L) to abbreviate the rescaling term, and unpacking what ★L means.
Now, let's see how different Lf+(1−L)g and Lf′+(1−L)g are on the set CϵPgh(L)2. One of two things will occur. Our first possibility is that an x in that compact set also has L(x)≥ϵPgh(L)2λ⊙. Then
x∈CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙}
and f,f′ were selected to be equal on that set, so the two functions will be identical on that point. Our second possibility is that x in that compact set will have L(x)<ϵPgh(L)2λ⊙. In that case,
|L(x)f(x)+(1−L(x))g(x)−L(x)f′(x)−(1−L(x))g(x)|
=|L(x)f(x)−L(x)f′(x)|=L(x)|f(x)−f′(x)|≤ϵPgh(L)2λ⊙d(f,f′)
Because L(x)<δ.
Putting this together, Lf+(1−L)g and Lf′+(1−L)g are only ϵPgh(L)2λ⊙d(f,f′) apart when restricted to the compact set CϵPgh(L)2. By Lemma 2, we can then show that
|h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤λ⊙⋅ϵPgh(L)2λ⊙d(f,f′)+ϵPgh(L)2d(Lf+(1−L)g,Lf′+(1−L)g)
And, we also know that:
d(Lf+(1−L)g,Lf′+(1−L)g)=d(Lf,Lf′)≤d(f,f′)
Because L∈[0,1]. Making that substitution, we have:
|h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤λ⊙⋅ϵPgh(L)2λ⊙d(f,f′)+ϵPgh(L)2d(f,f′)
=ϵPgh(L)2d(f,f′)+ϵPgh(L)2d(f,f′)=ϵPgh(L)d(f,f′)
Backing up to earlier, we had established that
|(h|gL)(f)−(h|gL)(f′)|=1Pgh(L)|h(Lf+(1−L)g)−h(Lf′+(1−L)g)|
and from shortly above, we established that
|h(Lf+(1−L)g)−h(Lf′+(1−L)g)|≤ϵPgh(L)d(f,f′)
Putting these together,
|(h|gL)(f)−(h|gL)(f′)|<ϵd(f,f′)
For any two functions f and f′ which agree on
CϵPgh(L)2∩{x|L(x)≥ϵPgh(L)2λ⊙}
Witnessing that said set is an ϵ-almost-support for h|gL.
All we need to finish up is to show that this is a compact set in support(L) equipped with the subspace topology. This can be done by observing that in the original space X it's a compact set, due to being the intersection of a compact set and a closed set. In the subspace topology, if we try to make an open cover of it, all the open sets that cover it in the subspace topology are the restrictions of open sets in the original topology, so we have an open cover of this set in the original topology, and we can make a finite subcover, so it's compact in the subspace topology as well.
Thus, for any ϵ, we can make a compact (in support(L)) ϵ-almost-support for h|gL, so h|gL has compact almost-support and we've verified the last condition for an update of an infradistribution to be an update.
Now for deterministic pushfoward. Fix an ϵ, and let your appropriate set for g∗(h) be g(Cϵ) where Cϵ is a compact ϵ-almost-support for h. The image of a compact set is compact, so that part is taken care of. We still need to check that it's an ϵ-almost-support for g∗(h). Let f,f′ be equal on this set. Then
|g∗(h)(f)−g∗(h)(f′)|=|h(f∘g)−h(f′∘g)|≤ϵd(f∘g,f′∘g)
=ϵsupx|f(g(x)),f′(g(x))|≤ϵsupy|f(y),f′(y)|=ϵd(f,f′)
And we're done. This is because, for any point x∈Cϵ, feeding it through g makes a point in g(Cϵ), and feeding it through f and f′ produces identical results because they agree on g(Cϵ). Therefore, f∘g and f′∘g agree on Cϵ and thus can have values only ϵd(f∘g,f′∘g) apart, which is actually upper-bounded by ϵd(f,f′). g(Cϵ) is thus a compact ϵ-almost-support for g∗(h), and this can be done for any ϵ, so g∗(h) has compact almost-support.
Since these three operations always produce infradistributions (as we've shown, we verified the last condition). Updating only has two properties to check, preserving homogenity when g=0 and cohomogenity when g=1, so let's get that knocked out.
Homogenity using homogenity for h
(h|0L)(af)=h(af★L0)−h(0★L0)h(1★L0)−h(0★L0)=h(Laf)−h(0)h(1★L0)−h(0★L0)=ah(Lf)h(1★L0)−h(0★L0)
=ah(Lf)h(1★L0)−h(0★L0)=ah(Lf)−h(0)h(1★L0)−h(0★L0)=ah(f★L0)−h(0★L0)h(1★L0)−h(0★L0)=a(h|0L)(f)
Cohomogenity using cohomogenity for h
(h|1L)(1+af)=h((1+af)★L1)−h(0★L1)h(1★L1)−h(0★L1)=h(L+aLf+1−L)−h(1−L)h(1)−h(1−L)
=h(1aLf)−h(1−L)1−h(1−L)=(1−a+ah(1+Lf))−h(1−L)1−h(1−L)
=1−a+ah(1+Lf)−h(1−L)+ah(1−L)−ah(1−L)1−h(1−L)
=1−h(1−L)1−h(1−L)−a−ah(1−L)1−h(1−L)+ah(1+Lf)−ah(1−L)1−h(1−L)
=1−h(1−L)1−h(1−L)−a1−h(1−L)1−h(1−L)+ah(1+Lf)−h(1−L)1−h(1−L)
=1−a+ah(1+Lf)−h(1−L)h(1)−h(1−L)=1−a+ah(L+Lf+(1−L))−h(1−L)h(L+(1−L))−h(1−L)
=1−a+ah((1+f)★L1)−h(0★L1)h(1★L1)−h(0★L1)=1−a+a(h|1L)(1+f)
Now for mixtures, we'll verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, and crispness.
Homogenity:
(Eζhi)(af)=Eζ(hi(af))=Eζ(ahi(f))=aEζ(hi(f))=a(Eζhi)(f)
1-Lipschitz:
|(Eζhi)(f)−(Eζhi)(f′)|=|Eζ(hi(f))−Eζ(hi(f′))|
≤Eζ|hi(f)−hi(f′)|≤Eζd(f,f′)=d(f,f′)
Cohomogenity:
(Eζhi)(1+af)=Eζ(hi(1+af))=Eζ(1−a+ahi(1+f))
=1−a+aEζ(hi(1+f))=1−a+a(Eζhi)(1+f)
C-additivity:
(Eζhi)(c)=Eζ(hi(c))=Eζ(c)=c
Crispness: Observe that both homogenity and C-additivity are preserved, and crispness is equivalent to the conjunction of the two.
Now for deterministic pushforwards, we'll verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness.
Homogenity:
(g∗(h))(af)=h((af)∘g)=h(a(f∘g))=ah(f∘g)=a(g∗(h))(f)
1-Lipschitzness:
|(g∗(h))(f)−(g∗(h))(f′)|=|h(f∘g)−h(f′∘g)|≤d(f∘g,f′∘g)≤d(f,f′)
Cohomogenity:
(g∗(h))(1+af)=h((1+af)∘g)=h(1+a(f∘g))=1−a+ah(1+(f∘g))
=1−a+ah((1+f)∘g)=1−a+a(g∗(h))(1+f)
C-additivity:
(g∗(h))(c)=h(c∘g)=h(c)=c
Crispness: Both homogenity and C-additivity are preserved, so crispness is preserved too.
Sharpness:
(g∗(h))(f)=h(f∘g)=infx∈Cf(g(x))=infy∈g(C)f(y)
And g(C) is the image of a compact set, so it's compact. And we're done!
Proposition 11: The inf of two infradistributions is always an infradistribution, and inf preserves the infradistribution properties indicated by the diagram at the start of this section.
We'll first verify the infradistribution properties of the inf, and then show it preserves the indicated properties if both components have them.
We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if f′≥f, then
inf(h1,h2)(f′)=inf(h1(f′),h2(f′))≥inf(h1(f),h2(f))=inf(h1,h2)(f)
This was done by monotonicity for the components. For concavity,
inf(h1,h2)(pf+(1−p)f′)=inf(h1(pf+(1−p)f′),h2(pf+(1−p)f′))
≥inf(ph1(f)+(1−p)h1(f′),ph2(f)+(1−p)h2(f′))
≥inf(ph1(f),ph2(f))+inf((1−p)h1(f′),(1−p)h2(f′))
=pinf(h1(f),h2(f))+(1−p)inf(h1(f′),h2(f′))
=pinf(h1,h2)(f)+(1−p)inf(h1,h2)(f′)
The first ≥ happened because h1 and h2 are concave, the second is because inf(a+b,c+d)≥inf(a,c)+inf(b,d).
For normalization,
inf(h1,h2)(1)=inf(h1(1),h2(1))=inf(1,1)=1
And the same argument applies to 0, so the inf is normalized.
For Lipschitzness, the inf of two Lipschitz functions is Lipschitz.
That just leaves compact almost-support. Fix an arbitary ϵ, and get a C1ϵ compact ϵ-almost-support for h1, and a C2ϵ for h2. We will show that C1ϵ∪C2ϵ is a compact ϵ-almost-support for inf(h1,h2). It's compact because it's a finite union of compact sets.
Now, let f and f′ agree on C1ϵ∪C2ϵ. We can go:
|inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|
There are four possible cases for evaluating this quantity. In case 1, h1(f)≤h2(f) and h1(f′)≤h2(f′). Then our above term turns into |h1(f)−h1(f′)|. However, since f and f′ agree on C1ϵ∪C2ϵ, they must agree on C1ϵ, and only have expectations ≤ϵd(f,f′) apart. Case 2 where h1(f)≥h2(f) and h1(f′)≥h2(f′) is symmetric and can be disposed of by a nearly identical argument, we just do it with h2 and C2ϵ.
Case 3 where h1(f)<h2(f) and h1(f′)>h2(f′) takes a slightly fancier argument. We can go:
−ϵd(f,f′)<h1(f)−h1(f′)<h1(f)−h2(f′)<h2(f)−h2(f′)<ϵd(f,f′)
The end inequalities are because f and f′ agree on the ϵ-almost-supports of h1 and h2, respectively, from agreeing on the union. The two inner inequalities are derived from the assumed inequalities in Case 3.
Thus,
|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h2(f′)|<ϵd(f,f′)
Case 4 where the assumed starting inequalities go in the other direction is symmetric. So, no matter which infradistributions are lower in the two infs, we have
|inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|<ϵd(f,f′)
And we're done, we made a compact almost-support for inf(h1,h2) assuming an arbitrary ϵ. So the inf of two infradistributions is a infradistribution.
Now to verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness preservation.
Homogenity:
inf(h1,h2)(af)=inf(h1(af),h2(af))=inf(ah1(f),ah2(f))
=ainf(h1(f),h2(f))=ainf(h1,h2)(f)
1-Lipschitzness:
|inf(h1,h2)(f)−inf(h1,h2)(f′)|=|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|
Now we can split into four cases. In cases 1 and 2 where the infs turn into h1(f),h1(f′) (and same for h2 in case 2), we have:
|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h1(f′)|≤d(f,f′)
(and same for h2), and we're done with those cases. In cases 3 and 4 where the infs turn into h1(f),h2(f′) (and vice-versa for case 4), we have:
−d(f,f′)≤h1(f)−h1(f′)<h1(f)−h2(f′)<h2(f)−h2(f′)≤d(f,f′)
Because h1 and h2 are 1-Lipschitz. Thus,
|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|=|h1(f)−h2(f′)|≤d(f,f′)
A symmetric argument works for case 4. So, no matter what,
|inf(h1(f),h2(f))−inf(h1(f′),h2(f′))|≤d(f,f′)
And we're done, the inf is 1-Lipschitz too.
Cohomogenity:
inf(h1,h2)(1+af)=inf(h1(1+af),h2(1+af))
=inf(1−a+ah1(1+f),1−a+ah2(1+f))
=1−a+ainf(h1(1+f),h2(1+f))
=1−a+ainf(h1,h2)(1+f)
C-additivity:
inf(h1,h2)(c)=inf(h1(c),h2(c))=inf(c,c)=c
Crispness: Homogenity and C-additivity are both preserved, so crispness is preserved.
Sharpness:
inf(h1,h2)(f)=inf(h1(f),h2(f))=inf(infx∈C1f(x),infx∈C2f(x))=infx∈C1∪C2f(x)
And we're done.
Proposition 12: Einf(H1,H2)(f)=inf(EH1(f),EH2(f))
Einf(H1,H2)(f)=inf(m,b)∈inf(H1,H2)m(f)+b=inf(m,b)∈H1∪H2m(f)+b
=inf(inf(m,b)∈H1(m(f)+b),inf(m,b)∈H2(m(f)+b))
=inf(EH1(f),EH2(f))
Proposition 13: If a family of infradistributions {hi}i∈I has a shared upper bound on the Lipschitz constant, and for all ϵ, there is a compact set Cϵ that is an ϵ-almost support for all hi, then infihi, defined as (infihi)(f):=infi(hi(f)), is an infradistribution. Further, for all conditions listed in the table, if all the hi fulfill them, then infihi fulfills the same property.
We'll first verify the infradistribution properties of the infinite inf, and then show it preserves the indicated properties if all components have them.
We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if f′≥f, then
(infihi)(f′)=infi(hi(f′))≥infi(hi(f))=(infihi)(f)
This was done by monotonicity for all components. For concavity,
(infihi)(pf+(1−p)f′)=infi(hi(pf+(1−p)f′))≥infi(phi(f)+(1−p)hi(f′))
≥infi(phi(f))+infi((1−p)hi(f′))=pinfi(hi(f))+(1−p)infi(hi(f′))
=p(infihi)(f)+(1−p)(infihi)(f′)
The first ≥ happened because h1 and h2 are concave, the second is because infi(ai+bi)≥infi(ai)+infi(bi).
For normalization,
(infihi)(1)=infi(hi(1))=infi(1)=1
And the same argument applies to 0, so the inf is normalized.
For Lipschitzness, let λ⊙ be your uniform upper bound on the Lipschitz constants of the hi. Then,
|(infihi)(f)−(infihi)(f′)|=|infi(hi(f))−infi(hi(f′))|
And then, for all the hi, they only think those functions differ by λ⊙d(f,f′) or less, and the same property applies to the inf by picking a hi and hi′ that very very nearly attain the two minimums, and showing that if the infinimums were >λ⊙d(f,f′) apart, you could have hi(f′) appreciably undershoot hi′(f′), and in fact, undershoot infi(hi(f′)), which is impossible. Thus,
|infi(hi(f))−infi(hi(f′))|≤λ⊙d(f,f′)
And we're done.
That just leaves compact almost-support. Fix an arbitary ϵ. We know there is some Cϵ that is a compact ϵ-almost-support for all the hi. We will show that Cϵ is an ϵ-almost-support for infihi.
Let f and f′ agree on Cϵ. We can go:
|(infihi)(f)−(infihi)(f′)|=|infi(hi(f))−infi(hi(f′))|
Pick a hi and hi′ that very very very nearly attain the inf. Then we can approximately reexpress this quantity as:
|inf(hi(f),hi′(f))−inf(hi(f′),hi′(f′))|
We're approximately in a case where hi(f)≤hi′(f) and hi(f′)≥hi′(f′), so we can go:
−ϵd(f,f′)≤hi(f)−hi(f′)<hi(f)−hi′(f′)<hi′(f)−hi′(f′)≤ϵd(f,f′)
The end inequalities are because f and f′ agree on the ϵ-almost-support of hi and hi′. The two inner inequalities are derived from the assumed inequalities in our case. Thus,
|infi(hi(f))−infi(hi(f′))|≃|hi(f)−hi′(f′)|≤ϵd(f,f′)
And we're done, we made a compact almost-support for infihi assuming an arbitrary ϵ. So the inf of this family of infradistributions is a infradistribution.
Now to verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness preservation.
Homogenity:
(infihi)(af)=infi(hi(af))=infi(ahi(f))
=ainfi(hi(f))=a(infihi)(f)
1-Lipschitzness: Same as the Lipschitz argument, everyone has a Lipschitz constant of 1, so the inf has the same Lipschitz constant.
Cohomogenity:
(infihi)(1+af)=infi(hi(1+af))
=infi(1−a+ahi(1+f))
=1−a+ainfi(hi(1+f))
=1−a+a(infihi)(1+f)
C-additivity:
(infihi)(c)infi(hi(c))=infi(c)=c
Crispness: Homogenity and C-additivity are both preserved, so crispness is preserved.
Sharpness:
(infihi)(f)=inf(hi(f))=infi(infx∈Cif(x))=infx∈⋃iCif(x)=infx∈¯¯¯¯¯¯¯¯¯¯¯⋃iCif(x)
We do have to check whether or not ⋃iCi is compact, however. We'll start by showing that for an arbitrary Ci, any compact set K where Ci⊈K can't be an ϵ-support of hi for any ϵ<1. The proof proceeds as follows:
Let x∗ be some point in Ci but not in K. It must be some finite distance away from K. Craft a continuous function f0 supported on K∪{x∗}. f0 is 1 on K and 0 on {x∗}. Use the Tietze extension theorem to extend f0 to all of X. Then
|hi(f0)−hi(1)|=|infx∈Cif0(x)−infx∈Ci1|=|f0(x∗)−1|=|0−1|=1
However, f0 and 1 agree on K, so K can't be an ϵ-almost-support for any ϵ<1.
Thus, in order for there to be a compact set Cϵ that's an ϵ-almost-support for all hi, it must be that ∀i:Ci⊆Cϵ. Then
¯¯¯¯¯¯¯¯¯¯¯¯⋃iCi⊆Cϵ
because all the Ci are in it and Cϵ is closed. So, the closure of our union is a closed subset of a compact set and thus is compact, so infihi is minimizing over a compact set and thus is crisp.
Proposition 14: If sup(h1,h2)(0)=0 and sup(h1,h2)(1)=1, then the supremum is an infradistribution.
The supremum is defined as:
sup(h1,h2)(f)=supf1,f2,p:pf1+(1−p)f2≤fph1(f2)+(1−p)h2(f2)
We'll verify the infradistribution properties of the sup.
We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if f′≥f, then
sup(h1,h2)(f)=supf1,f2,p:pf1+(1−p)f2≤fph1(f1)+(1−p)h2(f2)
≤supf1,f2,p:pf1+(1−p)f2≤f′ph1(f1)+(1−p)h2(f2)=sup(h1,h2)(f′)
This was done by f′≥f so there's more options available. For concavity,
qsup(h1,h2)(f)+(1−q)sup(h1,h2)(f′)
=qsupf1,f2,p:pf1+(1−p)f2≤fph1(f1)+(1−p)h2(f2)
+(1−q)supf′1,f′2,p′:p′f′1+(1−p′)f′2≤fp′h1(f′1)+(1−p′)h2(f′2)
Pick your f1,f2,f′1,f′2,p,p′ that very very very nearly attain the supremum.
≃qph1(f1)+q(1−p)h2(f2)+(1−q)p′h1(f′1)+(1−q)(1−p′)h2(f′2)
=(qp+(1−q)p′)(qpqp+(1−q)p′h1(f1)+(1−q)p′qp+(1−q)p′h1(f′1))
+(q(1−p)+(1−q)(1−p′))(q(1−p)q(1−p)+(1−q)(1−p′)h2(f2)+(1−q)(1−p′)q(1−p)+(1−q)(1−p′)h2(f′2))
≤(qp+(1−q)p′)h1(qpqp+(1−q)p′f1+(1−q)p′qp+(1−q)p′f′1)
+(q(1−p)+(1−q)(1−p′))h2(q(1−p)q(1−p)+(1−q)(1−p′)f2+(1−q)(1−p′)q(1−p)+(1−q)(1−p′)f′2)
Also, we can verify that:
(qp+(1−q)p′)(qpqp+(1−q)p′f1+(1−q)p′qp+(1−q)p′f′1)
+(q(1−p)+(1−q)(1−p′))(q(1−p)q(1−p)+(1−q)(1−p′)f2+(1−q)(1−p′)q(1−p)+(1−q)(1−p′)f′2)
=qpf1+(1−q)p′f′1+q(1−p)f2+(1−q)(1−p′)f′2
=q(pf1+(1−p)f2)+(1−q)(p′f′1+(1−p′)f′2)≤qf+(1−q)f′
Therefore, it is a suitable parameter and pair of functions to lower bound
qf+(1−q)f′. Accordingly
(qp+(1−q)p′)h1(qpqp+(1−q)p′f1+(1−q)p′qp+(1−q)p′f′1)
+(q(1−p)+(1−q)(1−p′))h2(q(1−p)q(1−p)+(1−q)(1−p′)f2+(1−q)(1−p′)q(1−p)+(1−q)(1−p′)f′2)
≤supp∗,f∗1,f∗2:p∗f∗1+(1−p∗)f∗2≤qf+(1−q)f′p∗h1(f∗1)+(1−p∗)h2(f∗2)=sup(h1,h2)(qf+(1−q)f′)
Putting all this together, and picking better and better approximations to the two suprema, we can conclude that:
qsup(h1,h2)(f)+(1−q)sup(h1,h2)(f′)≤sup(h1,h2)(qf+(1−q)f′)
And we have concavity.
For normalization, we're assuming it holds at the start.
Lipschitzness takes a slightly more involved argument. Pick two functions f and f′, and without loss of generality, assume sup(h1,h2)(f)≥sup(h1,h2)(f′). Now, what we can do is pick a p, f1 and f2 which approximately obtain the defining supremum for f, so we have:
sup(h1,h)2)(f)≃ph1(f1)+(1−p)h2(f2)
Now, we can note two things. First,
p(f1−d(f,f′))+(1−p)(f2−d(f,f′))=(pf1+(1−p)f2)−d(f,f′)≤f−d(f,f′)≤f′
Therefore, the same p, and f1−d(f,f′), and f2−d(f,f′) are suitable things to lower-bound the value of sup(h1,h2)(f′). In particular, we have:
sup(h1,h2)(f′)=supq,f′1,f′2:qf′1+(1−q)f′2≤f′qh1(f′1)+(1−q)h2(f′2)
≥ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′))
Also, we have the result that:
ph1(f1)+(1−p)h2(f2)−ph1(f1−d(f,f′))−(1−p)h2(f2−d(f,f′))
=|(ph1(f1)+(1−p)h2(f2))−(ph1(f1−d(f,f′))−(1−p)h2(f2−d(f,f′)))|
≤|ph1(f1)−ph1(f1−d(f,f′))|+|(1−p)h2(f2)−(1−p)h2(f2−d(f,f′))|
=p|h1(f1)−h1(f1−d(f,f′))|+(1−p)|h2(f2)−h2(f2−d(f,f′))|
≤p(λ⊙1⋅d(f,f′))+(1−p)(λ⊙2⋅d(f,f′))≤max(λ⊙1,λ⊙2)d(f,f′)
Because of Lipschitzness of h1 and h2. Now we can begin showing our inequalities. So, we've shown that:
sup(h1,h2)(f′)≥ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′))
Therefore,
ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′))−sup(h1,h2)(f′)≤0
With this result, we can go:
max(λ⊙1,λ⊙2)d(f,f′)
≥max(λ⊙1,λ⊙2)d(f,f′)+ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′))−sup(h1,h2)(f′)
Let's save this result for a bit later.
Also, we had:
ph1(f1)+(1−p)h2(f2)−ph1(f1−d(f,f′))−(1−p)h2(f2−d(f,f′))≤max(λ⊙1,λ⊙2)d(f,f′)
And we also picked p and f1 and f2 to approximately attain the supremum, so we know:
ph1(f1)+(1−p)h2(f2)≃sup(h1,h2)(f)
Therefore, we approximately have:
sup(h1,h2)(f)−ph1(f1−d(f,f′))−(1−p)h2(f2−d(f,f′))≤max(λ⊙1,λ⊙2)d(f,f′)
Reshuffling this around a bit, we have:
max(λ⊙1,λ⊙2)d(f,f′)+ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′))≥sup(h1,h2)(f)
Using this with our saved result, we can get:
max(λ⊙1,λ⊙2)d(f,f′)
≥max(λ⊙1,λ⊙2)d(f,f′)+ph1(f1−d(f,f′))+(1−p)h2(f2−d(f,f′))−sup(h1,h2)(f′)
≥sup(h1,h2)(f)−sup(h1,h2)(f′)≥0
That last inequality was because we assumed at the start without loss of generality that f got an equal or higher expectation than f′.
Therefore, we have our result that, in general,
|sup(h1,h2)(f)−sup(h1,h2)(f′)|≤max(λ⊙1,λ⊙2)d(f,f′)
And thus, the supremum of two Lipschitz infradistributions is Lipschitz. That just leaves compact almost-support, which is quite tricky to show.
Fix an arbitary ϵ, and get a C1ϵ compact ϵ-almost-support for h1, and a C2ϵ for h2. We will show that C1ϵ∪C2ϵ is a compact ϵ-almost-support for sup(h1,h2). It's compact because it's a finite union of compact sets.
Now, let f and f′ agree on C1ϵ∪C2ϵ. Without loss of generality, assume that sup(h1,h2)(f)≥sup(h1,h2)(f′) (if not, flip f and f′). We'll show that they have similar expectations by showing that sup(h1,h2)(f)−sup(h1,h2)(f′) is below a small number (we already know that it's above 0 by our without-loss-of-generality assumption).
We can go:
sup(h1,h2)(f)=supp,f1,f2:pf1+(1−p)f2≤fph1(f1)+(1−p)h2(f2)≃ph1(f1)+(1−p)h2(f2)
Where we picked a particular p,f1,f2 spectacularly close to the highest possible value s.t. pf1+(1−p)f2≤f. In particular, if p is 0 or 1, we can ensure that f1 or f2 is f itself, by monotonicity of h1 or h2 respectively.
For successive arguments, we need p∈(0,1) so we have to address those endpoints. Assume p=1. Then, sup(h1,h2)(f)≃h1(f). Then, we have:
sup(h1,h2)(f)−sup(h1,h2)(f′)
≃h1(f)−sup(h1,h2)(f′)
≤h1(f′)+ϵd(f,f′)−sup(h1,h2)(f′)≤ϵd(f,f′)
The way this works is our substitution, and then using that f and f′ are identical on C1ϵ∪C2ϵ, and so are identical on C1ϵ, which is ϵ-almost-support of h1, we can upper-bound h1(f) with h1(f′)+ϵd(f,f′). And then, we just use that h1(f′)≤sup(h1,h2)(f′). If p=0, the exact same argument works, just with h2 and C2ϵ instead. That leaves the case where p∈(0,1), which requires far more involved arguments.
As a recap, we're assuming that sup(h1,h2)(f)≥sup(h1,h2)(f′), and that sup(h1,h2)(f)≃ph1(f1)+(1−p)h2(f2), and p∈(0,1). Now, we're going to pick out a continuous function with some special properties, so let the set-valued function ψ:X→R2 be defined as: If x∈C1ϵ∪C2ϵ, then ψ(x)=(f1(x),f2(x)). Otherwise, ψ(x) equals the intersection of:
[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)]
and
{(y,z)|py+(1−p)z≤f′(x)}
We'll find a continuous selection of this set-valued function, so let's start checking the properties needed to invoke the Michael selection theorem. We need that X is paracompact (all polish spaces are paracompact, check), that R2 is a Banach space (check), that for all x, ψ(x) is convex (it's either a single point or the intersection of a rectangle and a half-space, which is convex in both cases), closed (yup, it's either a point or the intersection of two closed sets, ie closed), nonempty, and lower-hemicontinuous.
Nonemptiness isn't too bad to show. It's nonempty for all points in our compact set of interest (the set consisting of a single point), and for x not in said set, (f1(x)−d(f,f′),f2(x)−d(f,f′)) witnesses the nonemptiness, because:
p(f1(x)−d(f,f′))+(1−p)(f2(x)−d(f,f′))=pf1(x)+(1−p)f2(x)−d(f,f′)
≤f(x)−d(f,f′)≤f′(x)
Lower-hemicontinuity is much more challenging to establish. Again, we have a sequence xn limiting to x, a point (y,z)∈ψ(x), and we must find a subsequence (ym,zm)∈ψ(xm) which limits to (y,z).
We can divide into three cases. In the first case, x lies in C1ϵ∪C2ϵ, and infinitely many members of the sequence lie in said set. In particular, since x lies in the compact set, the (y,z) pair associated with it must be (f1(x),f2(x)). Then we can isolate that particular subsequence that lies in the compact set, and have (ym,zm) be (f1(xm),f2(xm)), which, by continuity of f1 and f2, and the definition of ψ(xm) for xm in the compact set, lie in ψ(xm) and limit to (y,z) ie (f1(x),f2(x)).
In preparation for the second and third cases, we'll show that the function ψ′:X→R2 which just takes the second branch of the ψ function is continuous w.r.t. the Hausdorff-metric. Ie, for all x,
ψ′(x):=[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)]
∩{(y,z)|py+(1−p)z≤f′(x)}
is continuous when the space of compact subsets of R2 is equipped with a Hausdorff distance.
Accordingly, let xm limit to x. Our task is to show that, no matter how tiny of a number you name, you can find a tail of the xm sequence where the Hausdorff distance between ψ′(xm) and ψ′(x) is that tiny.
Specifically, we'll show that for all δ, there is some m where all later xm have ψ′(xm) within 2δp+2δ1−p+4δ Hasdorff distance of ψ′(x). Because p∈(0,1) and we can shrink δ to 0, this shows that the function ψ′ is continuous in Hausdorff-distance.
Because f1 and f2 and f′ are continuous functions, there's some very very large m where f1, f2, and f′ will only vary by δ from that point forward, regardless of which δ you pick. Pick some arbitrary (ym,zm)∈ψ′(xm). We'll show that it's close to a (y,z)∈ψ′(x), and the argument will only depend on distances, not position in sequence, so we can flip it to show the other half of Hausdorff-distance (all points in ψ′(x) are close to a point in ψ′(xm)).
We can divide into four possible cases. In cases 1 and 2, we have the following property holding.
ym≥f1(xm)−d(f,f′)+2δp+δ
With the negation for cases 3 and 4.
And in cases 1 and 3, we have:
zm≥f2(xm)−d(f,f′)+2δ1−p+δ
With the negation for cases 2 and 4.
In cases 1 and 2, you can let your selected y point be ym−2δp. We have the result that y∈[f1(x)−d(f,f′),f1(x)+d(f,f′)], because:
f1(x)+d(f,f′)≥f1(xm)−δ+d(f,f′)≥ym−δ≥ym−2δp=y
≥f1(xm)−d(f,f′)+2δp+δ−2δp=f1(xm)−d(f,f′)+δ≥f1(x)−d(f,f′)
In order, the first inequality is because f1 only varies by δ over such tiny distances due to continuity of f1, the second inequality is ym being paired with something to be in ψ′(xm) so it has a known upper bound on its value, then the third inequality is because p<1, the equality is our definition of our y, then for the next inequality using the fact that we're assuming that ym has a particular lower bound since we're in cases 1 and 2, Then there's just a cancellation, and f1 only varying by δ over such tiny distances.
You can use nearly identical arguments in cases 1 and 3 to get that, when you define z to be zm−2δ1−p. you have the result that z∈[f2(x)−d(f,f′),f2(x)+d(f,f′)]
Now, in cases 3 and 4, we can let y be: f1(x)−d(f,f′), and then we have:
−δ=f1(x)−δ−d(f,f′)−(f1(x)−d(f,f′))
=f1(x)−δ−d(f,f′)−y≤f1(xm)−d(f,f′)−y≤ym−y
≤f1(xm)−d(f,f′)+2δp+δ−y≤f1(x)+δ−d(f,f′)+2δp+δ−y
=f1(x)−d(f,f′)+2δp+2δ−y=f1(x)−d(f,f′)+2δp+2δ−(f1(x)−d(f,f′))
=2δp+2δ
The first equality is just pair-creation, then the second one is packing up the definition of y. The first inequality is because f1 only varies by δ over that distance, the second inequality is because ym∈ψ′(xm) so it's got the usual lower bound, then the next inequality after that is because we're in cases 3 and 4 so
ym<f1(xm)−d(f,f′)+2δp+δ
Then, it's just another "f1 doesn't change much over the tiny distance", moving the δ's together, unpacking y, and cancelling out. The net result is that we have:
|ym−y|≤2δp+2δ
You can use nearly identical arguments in cases 2 and 4 to get that, when you define z to be be f2(x)−d(f,f′) you have the result that |zm−z|≤2δ1−p+2δ.
At this point, we can resume our progress on the four cases and go "ok, in case 1, we have..."
ym≥f1(xm)−d(f,f′)+2δp+δ
zm≥f2(xm)−d(f,f′)+2δ1−p+δ
And we know that those properties lead to y being defined as ym−2δp and z being defined as zm−2δp. And we know that in that case,
(y,z)∈[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)]
So, all we have to check is that py+(1−p)z≤f′(x) in order to conclude that (y,z)∈ψ′(x). Let's do that.
py+(1−p)z=p(ym−2δp)+(1−p)(zm−2δ1−p)=pym+(1−p)zm−4δ
≤f′(xm)−4δ≤f′(x)+δ−3δ=f′(x)−2δ<f′(x)
And we have that (y,z)∈ψ′(x), accordingly. The first equality was unpacking definitions, then the second was some cancellation, and then the first inequality was because (ym,zm)∈ψ′(xm) by assumption so we have pym+(1−p)zm≤f′(xm). The second inequality was because f′ doesn't change much over such tiny distances and then it's just trivial cleanup.
Thus, when we picked a point (ym,zm)∈ψ′(xm) where xm is sufficiently close to x, and we're in case 1, we have that there are points (y,z)∈ψ′(x), and
d((ym,zm),(y,z))=|ym−y|+|zm−z|=∣∣ym−ym+2δp∣∣+∣∣zm−zm+2δ(1−p)∣∣
=2δp+2δ(1−p)
This is from the definitions of y and z in Case 1.
Now, let's address case 2, where
ym≥f1(xm)−d(f,f′)+2δp+δ
zm<f2(xm)−d(f,f′)+2δ1−p+δ
In this case, y is defined as ym−2δp, and z is defined as f2(x)−d(f,f′)
And we know that in that case,
(y,z)∈[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)]
(the first part on the y is the same argument from case 1, the second interval is from the value of z)
So, all we have to check is that py+(1−p)z≤f′(x) in order to conclude that (y,z)∈ψ′(x). We know that
−δ≤zm−z
So we can flip this a bit to get
z≤zm+δ
Accordingly, from that, we get:
py+(1−p)z≤p(ym−2δp)+(1−p)(zm+δ)=pym+(1−p)zm−2δ+(1−p)δ
≤f′(xm)−δ≤f′(x)
And we have that (y,z)∈ψ′(x), accordingly.The first inequality was definition unpacking and the inequality we just got, then the first equality is just breaking things up a bit, then the second inequality is just observing that 1−p≤1, and then f′ doesn't change much over such tiny distances.
Thus, when we picked a point (ym,zm)∈ψ′(xm) where xm is sufficiently close to x, and we're in case 2, we have that there are points (y,z)∈ψ′(x), and
d((ym,zm),(y,z))=|ym−y|+|zm−z|
≤∣∣ym−ym+2δp∣∣+2δ1−p+2δ=2δp+2δ(1−p)+2δ
This is from the definitions of y and z in Case 2, and the fact that in case 2 we can derive |zm−z|≤2δ1−p+2δ
Extremely similar arguments to case 2 dispatch case 3 with a resolution of the corresponding (y,z) lying in ψ′(x) and
d((ym,zm),(y,z))≤2δp+2δ(1−p)+2δ
Finally, for case 4, we have:
ym<f1(xm)−d(f,f′)+2δp+δ
zm<f2(xm)−d(f,f′)+2δ1−p+δ
In this case, y is defined as f1(x)−d(f,f′), and z is defined as f2(x)−d(f,f′)
Trivially, we have:
(y,z)∈[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)]
So, all we have to check is that py+(1−p)z≤f′(x) in order to conclude that (y,z)∈ψ′(x). To do this, we have:
We know that
−δ≤zm−z
So we can flip this a bit to get
z≤zm+δ
Accordingly, from that, we get:
py+(1−p)z=p(f1(x)−d(f,f′)+(1−p)(f2(x)−d(f,f′))
=pf1(x)+(1−p)f2(x)−d(f,f′)≤f(x)−d(f,f′)≤f′(x)
(because pf1+(1−p)f2≤f)
And we have that (y,z)∈ψ′(x), accordingly.
Thus, when we picked a point (ym,zm)∈ψ′(xm) where xm is sufficiently close to x, and we're in case 4, we have that there are points (y,z)∈ψ′(x), and
d((ym,zm),(y,z))=|ym−y|+|zm−z|≤2δ1−p+2δ+2δ1−p+2δ=2δp+2δ(1−p)+4δ
This is from the definitions of y and z in Case 4, and the fact that in case 4 we can derive |ym−y|≤2δ1−p+2δ (and same for zm)
These 4 cases were exhaustive, so we now know that, given any x and sequence of points xm limiting to x, and any δ, there is a tail of sufficiently large m's where the distance from any point in ψ′(xm) to ψ′(x) is 2δp+2δ(1−p)+4δ or less. We can also flip xm and x and use our four cases (our argument is symmetric) to show that actually, this is a bound on the Hausdorff distance between ψ′(xm) and ψ′(x). δ was arbitrary, as was the sequence xm and the x, so this means that ψ′ is continuous in Hausdorff-distance.
Ok, we're a bit in the weeds here, how does that help? Well, we were trying to verify the compact almost-support property for the supremum. This requires, as part of it, getting a continuous function with some special properties. We're going to apply a selection function to get it, but we could only take care of the prerequisites that aren't lower-hemicontinuity. And to show lower-Hemicontinuity in general, we needed to take this detour through showing that the modified set-valued function is continuous in Hausdorff-distance. So let's pop back up the stack.
One level back up the stack, we were trying to show lower-Hemicontinuity. It is the property that given any sequence xn which limits to x, and any (y,z)∈ψ(x), there is some subsequence xm and (ym,zm)∈ψ(xm) where (ym,zm) limits to ψ(xm). We dispatched the case where infinitely many elements of the sequence were in our C1ϵ∪C2ϵ, leaving two cases. There's the case where only finitely elements of that sequence are in that compact set, but the limit point x lies in that set. There's also the case where the limit point x doesn't lie in that set.
Dealing with case 3, we have a sequence xn heading to x. Strip off all the xn that lie in the compact set, making your xm. And let (ym,zm) be whichever point in ψ(xm) is closest to (y,z). Now, by how they were defined, ψ(xm)=ψ′(xm), and ψ′ is continuous in Hausdorff-distance, so "take the closest point" is definitely going to get you the convergence you seek to your arbitrarily selected (y,z)∈ψ(x) point.
For case 2, where we're limiting to x from outside the compact set, all we need to show is that ψ(x)⊆ψ′(x) (we don't necessarily have equality because ψ and ψ′ start being different on that compact set), in order to get a sequence (ym,zm) converging to the (y,z)∈ψ(x) point. So, let's do this. Because x lies in C1ϵ∪C2ϵ, we have that ψ(x)=(f1(x),f2(x)).
The conditions for (y,z) to be in ψ′(x) are:
(y,z)∈[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)]
Which is obviously true for f1(x),f2(x), and:
py+(1−p)z≤f′(x)
Which is the case because:
pf1(x)+(1−p)f2(x)≤f(x)=f′(x)
By how f was made, and f′=f on that compact set.
Thus, we're done, we verified lower-hemicontinuity for ψ in all the cases, so we can invoke the Michael selection theorem and get a continuous selection f∗:X→R2 with three valuable properties. Let's abbreviate pr1(f∗(x)) as f′1, for notational convenience. It's projecting it to the first coordinate. f′2 is defined similarly.
Our first notable property is:
x∈C1ϵ∪C2ϵ→f′1(x)=f1(x)∧f′2(x)=f2(x)
(ie, projecting f∗ down to the two coordinates makes functions which perfectly mimic f1 and f2 on the compact set of interest)
Our second one is:
d(f1,f′1)≤d(f,f′)
And the same for d(f2,f′2).
And our third notable property is that:
pf′1+(1−p)f′2≤f′
But why do these properties hold of our selection function? Well, when x lies in that compact set, ψ(x)=(f1(x),f2(x)), so our selection function is forced to have its projections mimic f1 and f2 on said compact set, taking care of the first one.
For our second property, we have:
f∗(x)∈ψ(x)⊆[f1(x)−d(f,f′),f1(x)+d(f,f′)]×[f2(x)−d(f,f′),f2(x)+d(f,f′)]
Accordingly, we know that the projections to the two coordinates can't be too far away from f1and f2 respectively.
For our third property, we have:
f∗(x)∈ψ(x)⊆{(y,z)|py+(1−p)z≤f′(x)}
Accordingly, the projections to the two coordinates can't mix to exceed the function f′.
So, where to from here? Well, we have:
|sup(h1,h2)(f)−sup(h1,h2)(f′)|=sup(h1,h2)(f)−sup(h1,h2)(f′)
≃ph1(f1)+(1−p)h2(f2)−sup(h1,h2)(f′)
≤p(h1(f′1)+ϵd(f,f′))+(1−p)h2(f′2)+ϵd(f,f′))−sup(h1,h2)(f′)
=ph1(f′1)+(1−p)h2(f′2)−sup(h1,h2)(f′)+ϵd(f,f′)≤ϵd(f,f′)
Here's why. We assumed at the very start that without loss of generality, we'd take f to be the one with higher expectation value. We found a p,f1,f2 that nearly replicated the expectation value of sup(h1,h2)(f). f′1 copies f1 on a compact almost-support of h1, namely C1ϵ, and we also have d(f′1,f1)≤d(f′,f), and similar for f′2 and f2. And finally, since pf′1+(1−p)f′2≤f′, that mix must have lower value than sup(h1,h2)(f′). And we're done! f and f′ were arbitrary except that they agreed on C1ϵ∪C2ϵ, a compact set, and we got:
sup(h1,h2)(f)−sup(h1,h2)(f′)≤ϵd(f,f′)
Witnessing that said set is a compact ϵ-almost-support. ϵ was arbitrary, so sup(h1,h2) is compactly-almost-supported. This is the last condition needed to check to see that it's an infradistribution.
Proposition 15: All three characterizations of the supremum given in Definition 13 are identical.
So, the first characterization we gave was:
sup(h1,h2)(f):=supp,f1,f2:pf1+(1−p)f2≤fph1(f1)+(1−p)h2(f2)
And the second characterization was the least infradistribution greater than h1,h2 in the information ordering.
And the third characterization was as the concave monotone hull of f↦sup(h1(f),h2(f)).
We will use sup1,sup2,sup3 for these three characterizations of the supremum of two infradistributions and show that they are equal.
Let's begin showing this.
sup2(h1,h2)(f)≥supζ∈ΔN,fi∈∏iCB(X):Eζfi≤f(sup2(h1,h2)(Eζ(fi)))
This occurs by monotonicity, any mix of functions which undershoots f must get a lower score because sup2(h1,h2) is an infradistribution.
≥supζ∈ΔN,fi∈∏iCB(X):Eζfi≤fEζ(sup2(h1,h2)(f))
This is because of convexity of sup2(h1,h2), since it's an infradistribution. The value of the mix is as good or better than the mix of the values.
≥supζ∈ΔN,fi∈∏iCB(X):Eζfi≤fEζ(sup(h1(fi),h2(fi)))=sup3(h1,h2)(f)
This is because sup2(h1,h2)≥h1 (and same for h2), so making that swap decreases the value. Also, this quantity is the concave monotone hull of the supremum of h1,h2. Why? Well, sup(h1(f),h2(f)) is our first attempt at assessing the value of a function f. However, it isn't necessarily monotone. So, supf∗≤fsup(h1(f∗),h2(f∗)) is the monotone hull, we're saying that if there's a value below you that outscores you, then you should update the value of f to be big enough. And then, to get the concave monotone hull, we replace the lower bound on f with a countable/arbitrary finite mix of functions because any concave function should have the value of the mix be ≥ the mix of the values, so we have to bump the value of f∗ up to at least the mix of the values to not violate concavity. Anyways, now that we know this is sup3(h1,h2)(f), we can go further to:
≥supp,f1,f2:pf1+(1−p)f2≤f(psup(h1(f1),h2(f1))+(1−p)sup(h1(f2),h2(f2)))
This is lower because now we're specializing to only certain sorts of probability distributions over N, those that are only supported on the first two values, so it's harder to attain suprema. And now,
≥supp,f1,f2:pf1+(1−p)2≤f(ph1(f1)+(1−p)h2(f2))=sup1(h1,h2)(f)
We swapped out the supremum for a specific term in it in order to do this, and used our given definition of sup1. And then we can specialize p to 1 and f1 to f itself, to get
≥h1(f)
Similarly, we could specialize to p=0 and f2=f to get ≥h2(f). So taking stock of what we have,
sup2(h1,h2)(f)≥sup3(h1,h2)(f)≥sup1(h1,h2)(f)≥sup(h1(f),h2(f))
For all functions, so:
sup2(h1,h2)≥sup3(h1,h2)≥sup1(h1,h2)≥h1
(and same for h2) We recall that in Proposition 14 we proved that sup1 always makes an infradistribution. Since sup1 is above both component infradistributions, and sup2 was defined as the least infradistribution that is above h1 and h2, we must have equality, and
sup2(h1,h2)=sup3(h1,h2)=sup1(h1,h2)≥h1
(and same for h2) And we've shown the three definitions of the supremum are identical.
Proposition 16:
Esup(H1,H2)(f)=supp,f1,f2:pf1+(1−p)f2≤fpEH1(f1)+(1−p)EH2(f2)
To recap,
sup(H1,H2):=H1∩H2
Now, sup(H1,H2) can be turned into a concave monotone functional CB(X)→R, by LF-duality. Further, it's convex, closed, and upper-complete due to being the intersection of two convex closed upper-complete sets. Let's use h to refer to its corresponding functional. Then:
h(f)=Esup(H1,H2)(f)=inf(m,b)∈H1∩H2m(f)+b
≥inf(m,b)∈H1m(f)+b=EH1(f)=h1(f)
And the same applies to H2, and this applies to all functions, so h≥h1 (and same for h2).
We know from Proposition 15 that the least concave monotone functional above h1 and h2 is sup(h1,h2), so h≥sup(h1,h2)≥h1 (and same for h2) Call the corresponding set of sup(h1,h2) as Hsup. Thus, translating this information ordering back to sets,
sup(H1,H2)⊆Hsup⊆H1
And same for H2. Therefore.
sup(H1,H2)⊆Hsup⊆H1∩H2=sup(H1,H2)
Therefore, all the subsets must be actual equalities, and so in particular we have:
Hsup=sup(H1,H2)
Then we can go:
Esup(H1,H2)(f)=sup(h1,h2)(f)
=supp,f1,f2(ph1(f1)+(1−p)h2(f2))=supp,f1,f2:pf1+(1−p)f2≤f(pEH1(f1)+(1−p)EH2(f2))
By sup(H1,H2) being equivalent to the infradistribution set induced by sup(h1,h2), expanding our definition of the sup, and translating back. And we're done!
Proposition 17: For any property in the table at the start of this section, sup(h1,h2) will fulfill the property if both components fulfill the property.
The way to show this is to use the alternate characterizations of supremum as intersection of the infradistribution sets, and the alternate characterizations of the various properties in terms of properties of minimal points.
We will make an observation used in all further proofs of properties. In order for H1 to have (λμ,b) in it, there must be a minimal point of the form (λμ,b1) with b1≤b below it. Similarly, for H2 to contain (λμ,b), there must be a minimal point of the form (λμ,b2) below it, with b2≤b.
Thus, for (λμ,b) to lie in (H1∩H2)min, (λμ,b1)∈Hmin1 and (λμ,b2)∈Hmin2 and b=sup(b1,b2). Part of this is because said point lies in H1 and H2, the other part is because (λμ,sup(b1,b2)) is the lowest possible point in H1∩H2 associated with a measure component of λμ, and it's the minimal. This observation will be used for all future sub-proofs in this proposition.
Homogenity: This is equivalent to "all minimal points have b=0", so if (λμ,b)∈(H1∩H2)min, then (λμ,0)∈Hmin1 (homogenity for H1), and same for Hmin2, so b=sup(0,0)=0.
1-Lipschitzness: This is equivalent to "all minimal points have λ≤1", so if (λμ,b)∈(H1∩H2)min, then (λμ,b1)∈Hmin1, and λ≤1 (1-Lipschitzness of H1), so λ≤1.
Cohomogenity: This is equivalent to "all minimal points have λ+b=1", so if (λμ,b)∈(H1∩H2)min, then (λμ,b1)∈Hmin1, and λ+b1=1 (cohomogenity of H1), and (λμ,b2)∈Hmin2, and λ+b2=1, so b1=b2. Then, λ+b=λ+sup(b1,b2)=λ+b1=1.
C-additivity: This is equivalent to "all minimal points have λ=1", so if (λμ,b)∈(H1∩H2)min, then (λμ,b1)∈Hmin1, and λ=1 (C-additivity of H1), so λ=1.
Crispness: This is equivalent to the conjunction of homogenity and C-additivity, both of which are preserved, so crispness is preserved as well.
Sharpness: Because all sharp infradistributions are crisp, (H1∩H2)min must be composed entirely of probability distributions if H1 and H2 are sharp. If any of the probability distributions in (H1∩H2)min aren't supported on C1 (the compact set associated with the sharp infradistribution H1), then they aren't in H1, which is impossible. Symmetric arguments apply to H2. Thus, (H1∩H2)min only has probability distributions supported on C1∩C2. If there was any probability distribution supported on that set that was missing from (H1∩H2)min, then it'd be present in H1 and H2, and thus present in H1∩H2, and minimal, so we have a contradiction. Therefore (H1∩H2)min consists of all probability distributions supported on C1∩C2 which is a compact set, so the supremum is sharp as well.
Proposition 18: If a family of infradistributions hi is directifiable, then supihi (defined as the functional corresponding to the set ⋂iHi) exists and is an infradistribution. Further, for all conditions listed in the table, if all the hi fulfill them, then supihi fulfills the same property.
A family of infradistributions being directifiable is equivalent to "for any collection of finitely many infradistributions, the supremum exists". We also know that the supremum is exactly equivalent to set intersection. So, we'll show that directifiability (any collection of finitely many infradistributions has a supremum) implies that the intersection of all the infradistribution sets has the exact properties of a set-form infradistribution.
We have six properties to check. Nonemptiness, normalization (the existence of a point (λμ,0), existence of a point (λμ,b) with λ+b=1 and nonexistence of points with λ+b<1), closure, convexity, upper-completion, and compact-projection (the measure components of the infradistribution are contained in a compact set of measures).
For closure, it's the intersection of closed sets, so it's closed. For convexity, it's the intersection of convex sets, so it's convex. For upper-completion, it's the intersection of upper-complete sets, so it's upper-complete. For compact-projection, the measure components of the countable intersection are contained within the countable intersection of the sets of measure components, which is contained in a compact set, so it fulfills that property too.
This just leaves nonemptiness and normalization. We'll show normalization, which automatically implies nonemptiness. The nonexistence of points with λ+b<1 is definitely not preserved under intersection.
However, the compact-projection property means that for any infradistribution set Hi, the intersection of it with the surface of a-measures where λ+b=1 is compact, so we're intersecting a bunch of compact sets. Due to the existence of supremum infradistributions for each collection of finitely many infradistributions (directifiability), we have the nonempty finite intersection property needed to conclude that the intersection of compact sets is nonempty. The same argument applies to the existence of a point with b=0. The presence of those two points witnesses nonemptiness and normalization.
These are the last two conditions we needed to conclude the set represents an infradistribution, so the infinite supremum exists and is the infradistribution we need.
For preservation of the various properties, we can just reuse the arguments from Proposition 17 with only trivial modifications.