# LBIT Proofs 1: Propositions 1-9

30 min read16th Dec 2020No comments

# Ω 4

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

It is advised to contact me if you wish to read this, the proofs aren't very edited.

Lemma 1: If  is a Polish space and  is a compact subset and , and , then there is a third bounded continuous function which fulfills the following three properties: First, . Second, . Third, .

To prove this, we will use the Michael selection theorem to craft a continuous function with these properties. Accordingly, let the set-valued function  be defined as: if , and if , then:

Assuming there was a continous function  where , it'd get us our desired results. This is because:

And then, because  is a selection from , we can make these quantities bigger by selecting from  in the supremum as well, to get:

And then we can use the definition of  in the two cases (the latter case is because  implies ) to get:

And then we use the fact that we abbreviated the supremum of the difference between  and  as  to get:

So, we'd have one of our three results, that . Our second desired result that  mimicks  on  is trivial by the definition of . And finally,

And then, we do the ususual transition from f'' to the \psi function that it's a selection of,

And then, because  is always  when , and  is always bounded in  for the latter part, this turns into:

So we have our desired result of . As we have established that all three results follow from finding a continuous selection of , we just have to do that now. For this, we will be using the Michael selection theorem. In order to invoke it, we need to check that  is paracompact (all Polish spaces are paracompact), that  is a Banach space (it is), and verify some conditions on the sets produced.

We need that the function  is nonempty for all . This is trivial by the definition of  for , but for the other case, we need to verify that

Is a nonempty set. This is easy because  witnesses nonemptiness. We need that the function  is convex for all . This is easy because it's the intersection of two convex sets when , which is the trickier case to check. We also need that the function  is closed for all , which is true because it's either a point or it's an intersection of two closed sets.

All that we're missing in order to invoke the Michael Selection theorem to get a continuous function that works is verifying lower hemicontinuity for the function .

Lower hemicontinuity is: If  limits to , and , there's some subsequence  and  where , and  limits to .

In order to show this, we can take three different cases.

The first possible case is where infinitely many of the  lie in , so the limit point  must lie in  as well. Then,  can be the subsequence which lies in , and  can be the subsequence , which is the only possible choice of value that lies in . Due to continuity of , this obviously converges to , which is the only possible choice of value for  because .

The second possible case is where only finitely many of the  lie in , but yet the limit point  lies in  as well, like limiting to the border of a closed set from outside the closed set. Let  be the subsequence where you get rid of all the x's in the approximating sequence that lie in . Notice that \, instead of being written as

Can be written as the single interval

And now, let  be defined as:

Due to continuity of all these functions, and  limiting to , the limiting  value is:

Now, because

and

Therefore,

Therefore,

So our limiting y value reduces to:

Also, by similar arguments, we have:

and

Therefore

So, our limiting  value reduces to merely , ie, our point selected from , and we've shown lower hemicontinuity in this case. That just leaves one last case left over, the case where .

Again, as before, in this case,  is the interval

Let  be how close the point  is to the bottom ( for the bottom, or the top, when ). For your sequence  limiting to , it's made by picking out all the  which lie in , only finitely many, and the  are given by:

Because of the continuity of the functions  (supremum of two continuous functions) and  (inf of two continuous functions), the  limit to:

Which is just . So, we have lower hemicontinuity in this last case.

And therefore, we have lower hemicontinuity overall for , and so  has a continuous selection function by the Michael Selection Theorem, and said selection function fulfills the requisite properties.

Lemma 2: For all  and , if a functional  has a set  as an -almost-support, and has , then
where  is the Lipschitz constant of .

Proof: Via Lemma 1, there's a function  with the properties that:

Now, we can go:

And we're done. The critical steps in the second inequality were due to Lipschitzness of , and the fact that  and  agree on  which is an -almost-support for , respectively.

Proposition 1: For a continuous function , the metric  completely metrizes the set  equipped with the subspace topology.

To do this, we'll need four parts. First, we need to show that  is even a metric. Second, we'll need to show that for all open balls in the  metric, you can fit an open ball in the original metric within it (so all the open sets induced by the  metric were present in the original subspace topology). Second, we'll need to show that for all open balls in the original metric that lie within the support of , we can fit an open ball induced by the  metric within it (so all the open sets in the original subspace topology can be induced by the  metric), and parts 2 and 3 let us show that the  metric induces the subspace topology on the support of . Finally, part 4 is showing that any Cauchy sequence in the  metric is Cauchy in the original complete metric, so we can't have any cases of missing limits points, and showing that all Cauchy sequences in the support of , according to the  metric, must have their limit point lying in the support of , so limits never lead outside of the support of . This then shows that  is a complete metrization of the support of  (it induces the same topology as the subspace topology), and all limit points lie in the same space,

So, to begin with, if  is your original Polish space, pick a complete metrization of the space. Then, ensure that the maximum distance is 1 (this can always be done while preserving the exact same Cauchy sequences, and not affecting the topology in any way, and it's still a metric), and call that .

Also, there's some continuous function  that you'll be updating. The set  is defined as , and is clearly an open subset of  because the preimage of  (an open set) must be open due to the continuity of .

Now, define

To show it's a metric, we first need symmetry (which is obvious, because  and  and absolute value of difference are all symmetric). For identity of indiscernibles, the forward direction, the fact that  is bounded below 1 proves that , always, so for  to be 0, then , so . For the reverse direction,  must be 0, because the equation for distance would reduce to .

That just leaves the triangle inequality, and here, we'll critically use the fact that the original metric was clipped so it's never above 1. Our goal is to show that

Without loss of generality, we can assume that  (otherwise flip  and ), so we can split into three cases, which are:

For the first case, we have:

For the second case, we can do the same exact argument, just replace that  in the second line with . The third case is the tricky one, we'll work backwards. Remember that our inequalities are:

Now, let's proceed.

Then we do some regrouping, to yield:

At point, we remember that  because we bounded our initial distance, and  because of the problem case we're in, to get:

And then, we just remember that  in this problem case, to get:

And we're done with the triangle inequality, so d|L is indeed a metric.

Well, is it a complete metric for ? Well, by looking at the definition of , and remembering that  is always 1 or more because the likelihood function is bounded in , we have that:

When  and  are in the support of , so any Cauchy sequence in  must also be Cauchy in . Now, either a Cauchy sequence has its limit point also lying in the support of , in which case we're good, or it has its limit point lying on the edge of the support of  (and outside the set), where . However, in that case, the sequence in  cannot be Cauchy, because  would diverge to infinity due to the continuity of  and the limit point being on the edge where , so it's impossible that you could have some finite point that'd be close to all the rest, the absolute value term in the distance would forbid Cauchyness. So, it's a complete metric for the support of .

All that remains is to show that it induces the same topology as the subspace topology that the open set  should have. Because the support of  is itself open, and the space  is metrized by the distance metric , any open set in the support of  can be written as the intersection of an open set in  and the open set , and so it's open in , and can be written as the union of tiny open balls in the original distance metric.

So, we'll proceed by showing that any open ball (w.r.t. the  metric) within the support of , can fit some open ball (w.r.t. the  metric) in it that engulfs the center point, and any open ball (w.r.t. the  metric) within the support of , can fit some open ball (w.r.t. the  metric) in it that engulfs the center point.

If we can do it, than because any open set in the subspace topology can be written as the union of a bunch of open balls centered at each point in the open set according to the  metric, and each of those open balls according to the  metric has their center point engulfed by a smaller open ball according to the  metric, we'd be able to build the open set in the subspace topology out of a union of open sets in the  induced topology.

And also, any open set in the -induced topology can be written as a union of infinitely many open balls centered at each relevant point, and because each of those open balls has their center point engulfed by a smaller open ball according to the  metric, so we can build any open set in the -inducd topology out of a union of open sets in the original topology.

The net result is that the  induced topology and the subspace topology have the same open sets. So, let's get working on showing that we can fit the two sorts of balls inside each other.

In one direction, if you have a ball of size  according to the metric , then because  always, a ball of size  according to the metric  centered at the same point will fit entirely within the original ball, so we have one half of our topology argument done.

In the other direction, if you have a ball of size  according to the metric , around some point , then there's some  distance around  where the function  only varies by .

At this point, we can then fit a ball of size  around the point  (according to the original  metric), and it will lie within the ball of size  around the point  according to the  metric. The reason for this is:

And then, because we selected our distance between  and  so that  and  only differ by  at most (because a ball of size  suffices to accomplish this), we have:

And then, because  and  is very small, we have

So, this small of a ball in the original distance metric suffices to slot entirely inside a ball of size  centered at  in the  metric.

And that's all we need!

Theorem 1: The set of infradistributions (set form) is isomorphic to the set of infradistributions (functional form). The  part of the isomorphism is given by , and the  part of the isomorphism is given by , where  and  is the convex conjugate of .

Our first order of business is establishing the isomorphism. Our first direction is  to  and back is  exactly. By upper completion, and reproved analogues of Proposition 2 and Theorem 1 from "Basic inframeasure theory", which an interested party can reprove if they want to see it, we can characterize  as

And then, our  can further be reexpressed as

Also, , so we can rewrite this as:

and, by the definition of the convex conjugate, and the space of finite signed measures being the dual space of , and  being a functional applied to an element, this is...

So, our original set  is identical to the convex-conjugate set, when we go from  to  back to a set of sa-measures.

Proof Phase 2: In the reverse direction for isomorphism, assume that  fulfills the conditions (we'll really only need continuity and concavity)
We want to show that
\mathbb{E}_{\{(m,b)|b\ge(h')^{*}(m)\}}(f)=h(f)
Let's begin.

Given an , we have a natural candidate for minimizing the , just set it equal to . So then we get

And this is just... , and, because  is continuous over , and concave, then  is continuous over , and convex, so . From that, we get

and we're done with isomorphism.

So, in our first direction, we're going to derive the conditions on the functional from the condition on the set, so we can assume nonemptiness, closure, convexity, upper completion, projected-compactness, and normalization, and derive monotonicity, concavity, normalization, Lipschitzness, and compact almost-support (CAS) from that.

For monotonicity, remember that all points in the infradistribution set are a-measures, so if , then

We could do that because all the measure components are actual measures.

For concavity,

And we're done with that. For normalization,

And

So we have normalization.

For Lipschitzness, we first observe that compact-projection (the minimal points, when projected down to their measure components, make a set with compact closure) enforces that there's an upper bound on the  value of a minimal point , because otherwise you could pick a sequence with unbounded , and it'd have no convergent subsequence of measures, which contradicts precompactness of the minimal points projected down to their measure components.

Then, we observe that points in  correspond perfectly to hyperplanes that lie above the graph of , and a minimal point is "you shift your hyperplane down as much as you can as much as you can until you can't shift it down any more without starting to cut into the function ". Further, for every function , you can make a hyperplane tangent to the function  at that point by the Hahn-Banach theorem, which must correspond to a minimal point.

Putting it together, the hypograph of  is exactly the region below all its tangent hyperplanes. And we know all the tangent hyperplanes correspond to minimal points, and their Lipschitz constants correspond to the  value of the minimal points. Which are bounded. So, Compact-Projection in  implies  is Lipschitz.

Finally, we'll want compact almost-support. A set of measures is compact iff the amount of measure is upper-bounded, and, for all , there is a compact set  where all the measures  have  measure outside of .

So, given that the set of measures corresponding to  is compact by the compact-projection property, we want to show that the functional  has compact almost-support. To do this, we'll observe that if  is the inf of a bunch of functions, and all functions think two different points are only a little ways apart in value, then  must think they're only a little distance apart in value. Keeping that in mind, we have:

And then, we can think of a minimal point as corresponding to a hyperplane  and , and  is the inf of all of them, so to bound the distance between these two values, we just need to assess the maximum size of the gap between those values over all minimal points/tangent hyperplanes. Thus, we can get:

And then, we can do some canceling and get:

And then, because f' was selected to be 0 on , which makes up all but  of the measure for all measures present in , we can upper-bound  by , so we have that

And so,  is a compact -almost-support for , and this argument works for all , so  is CAS, and that's the last condition we need. Thus, if  is an infradistribution (set form), the expectation functional h is an infradistribution (expectation form).
Now for the other direction, where we assume monotonicity, concavity, normalization, Lipschitzness, and CAS on an infradistribution (expectation form) and show that the induced form fulfills nonemptiness, convexity, closure, upper completion, projection-compactness, normalization, and being a set of a-measures.

Remember, our specification of the corresponding set was:

Where  is the function given by , and  is the convex conjugate of .

First, being a nonempty set of a-measures. Because there's an isomorphism linking points of the set and hyperplanes above the graph of , we just need to establish that no hyperplanes above the graph of  can slope down in the direction of a nonnegative function (as this certifies that the measure component must be an actual measure), and no hyperplanes above the graph of  can assign 0 a value below 0 (as this corresponds to the  term, and can be immediately shown by normalization).

What we do is go "assume there's a  where the linear functional corresponding to  isn't a measure, ie, there's some nonnegative function  where ". Well, because of monotonicity for  (one of the assumed properties), we have .... And, because all affine functionals are made by taking a linear functional and displacing it, ..., decreases at a linear rate, so eventually the hyperplane and  cross over, but  was assumed to be above  always, so we have a contradiction.

Therefore, all hyperplanes above  must have their linear functional component corresponding to an actual measure, ie, being an a-measure. And we get nonemptiness from the concavity of , so we can pick any function and use the Hahn-Banach theorem to make a tangent hyperplane to h that touches at that point, certifying nonemptiness.

By the way, the convex conjugate, , can be reexpressed as .

For closure and convexity: By monotonicity of  and normalization, , and  is continuous (Lipschitz) on , and concave, so  is proper, continuous on , and convex, so, by the Wikipedia page on "Closed Convex Function",  is a closed convex function, and then by the Wikipedia page on "Convex Conjugate" in the Properties section,  is convex and closed. From the Wikipedia page on "Closed Convex Function", this means that the epigraph of  is closed, and also the epigraph of a convex function is convex. This takes care of closure and convexity for our .

Time for upper-completeness. Assume that  lies in the epigraph. Our task now is to show that  lies in the epigraph. This is equivalent to showing that . Let's begin.

And we're done.

Normalization of the resulting set is easy. Going from  to a (maybe)-inframeasure  back to  is identity as established earlier, so all we have to do is show that a failure of normalization in a (maybe)-inframeasure makes the resulting  not normalized. Thus, if our  is normalized, and it makes an  that isn't normalized, then going back makes a non-normalized , which contradicts isomorphism. So, assume there's a failure of normalization in . Then , or , so either  or  and we get a failure of normalization for  which is impossible. So  must be normalized.

That just leaves compact-projection. We know that a set of measures is precompact iff there's a bound on their  values, and for all , there's a compact set  where all the measure components have  measure outside of that set.

First, we can observe that no hyperplane above  can have a Lipschitz constant above the maximal Lipschitz constant for the function , because if it increased more steeply in some direction, you could go in the other direction to decrease as steeply as possible, and  would be constrained to decrease strictly less steeply in that direction, so if you went far enough in that direction, your hyperplane and  would intersect, which is impossible. Thus, Lipschitzness of  enforces that there can be no point in the set  with too much measure, which gives us one half of compact-projection for .

For the other half, CAS for  ensures that for all , there is a compact set  where

What we'll do is establish that no hyperplane lying above  can have a slope more than  in the direction of a function that's in  and is 0 on . Let  be such a function fulfilling those properties that makes some hyperplane above h slope down too hard. Then

Because going in the direction of a negative function decreases your value, so the Gateaux derivative would be negative, and  is in , and we have CAS on .

Now, we can realize that as we travel from 0 to  to  to , our vector of travel is always in the direction of , which can't be too negative. Each additional  added drops the value of  by at most . However, each additional  added drops the value of  (our assumed functional that's sloping down too hard in the  direction) by more than that quantity, so eventually  will cross over to be lower than , so  can't correspond to an a-measure in , and we have a contradiction.

Therefore, regardless of the point in , its measure component must assign any function that's 0 on  and bounded in  a value of  at most. We can then realize that this can only happen to a measure that assigns  or less measure to the set  (otherwise you can back off from your continuous function to a discontinuous indicator function).

Thus, given our , we've found a compact set  where the measure component of all points in  assign  or less value to the outside of that set, and this can be done for all , certifying the last missing piece for compact-projection of  (because the projection is precompact iff the set of measures is bounded above in amount of measure present, and for all , there's a compact set  where all the measures assign  measure outside of that set.

And that's the last condition we need to conclude that the set form of an infradistribution (functional form) is an infradistribution (set form), and we're done.

Proposition 2: For any infradistribution  there is a unique closed set  which is the intersection of all supports for , and is a support itself.

Proof sketch: The proof splits into three parts. First, we show that if  is a support and  is an -almost support, then  is a -almost support, by picking two functions which agree on  and using them to construct a third function which has similar expectations to both of them, due to agreeing with the first function on  and agreeing with the second function on , so our arbitrary functions which agree on the intersection have similar expectation values. A support is precisely an -almost support for all , so actual supports are preserved under finite intersection.

Second, we show that supports are preserved under countable intersections, which critically relies on compact -almost supports for all . Roughly, that part of the proof proceeds by using our compact almost-support in order to make a sequence of compact almost-supports nested in each other, and showing they converge in Hausdorff distance, so two functions which agree on the intersection of all the compact sets are very close to agreeing on some finite almost-support and have similar expectation values, so the intersection of compact sets is an almost-support. The property of being an almost-support is hereditary upwards, and we can show our countable intersection of compact sets is a subset of the countable intersection we're interested in, so it's an almost-suppport. However, we can shrink  to 0 so it's a support.

Finally, we show that supports are preserved under arbitrary intersections by using strong Lindelofness of  (because it's Polish) to reduce the arbitrary intersection case to the countable intersection case which has already been solved.

For the finite intersection case, let  be a support and  be an -almost support. Our task is to show that  is an -almost support, by showing that two functions which agree on it can't be more than  apart in expectation value.

Pick any two functions  and  where . Let's define a  on  as follows. If , then . If , and , then . Our first task is to show that  is continuous.

If  limits to , we can split into three cases. Our first case is where , and . Then past a certain n,  will always not be in  (complement of a closed set is open), and so its values are dictated by , which is continuous, and we have continuity of  at that point. The second case where  and  is symmetric, but with the values being dictated by  instead. Our third case where  is the only one which takes some care. As  limits to , then both  and  (not the restrictions, the original functions!) limit to  (because  and  agree on ) so no matter how our  toggles back and forth between  and , we have continuity.

Now that we know  is continuous on , we can use the Tietze Extension Theorem to extend  to a continuous function on the entire space , because the union of two closed sets is closed, and we can ensure that the extension stays in  as well. (because  is copying either  or ), call this extension .

Now, because

and  is a support, then . However, since

(because  and  agree on ), then

The last inequality is because  was selected to be bounded in the relevant range, and the first inequality is because  and  are identical on  which is a -almost-support.

Thus,

That last inequality was because  and  agree on  which is a support, and for the latter, we already derived that inequality. Because  and  were arbitrary functions which agreed on , and  was an arbitrary support and  was an arbitrary -almost-support, we have that the intersection of any support and -almost-support is an -almost support. As a special case of this, the intersection of two supports is a support, so being a support is preserved under finite intersection.

Now for supports being preserved under countable intersection. Fix a countable family of supports, . We will show that  is a support.

First, by our "compact almost-support" property, for all , there's a compact set  where any two functions agreeing on  will only differ in expectation by . Fix an arbitrary  to use through the rest of the argument, which induces a compact set .

Let  be defined as , and