Proposition 1:

Given some compact metric space of options , if is a bounded function,

PROOF: To do this, we must show that anything in the first set is in the second, and vice-versa. So, assume is in the first set. Then for any , . However, if is not in the second set, then it has probability-mass on some points where there's a s.t. . Moving that new probability mass to points of higher utility, we'd construct a new probability distribution s.t. , which is impossible. So must be in the second set.

In the reverse direction, any probability distribution supported over the maximum of (let be an arbitrary point in it) has . Thus, for any , it's supported entirely over points with equal or lesser utility than this maximum value, so . Both subset inclusion directions have been proved, so the equality holds.

Theorem 1: Fundamental Theorem of Infrafunctions

There is a bijection between concave upper-semicontinuous functions of type , and closed convex upper-complete sets of continuous functions .

Proof: For the reverse direction, going from sets to functions, letting denote the set and denote the function, we have that With this definition being made, let's verify that the resulting function fulfills the indicated property. For verifying concavity, we have For verifying upper-semicontinuity, let limit to . Now, for any particular function , due to limiting to , we have So, for any where is an (arbitrarily close) approximation to , we can go And so upper-semicontinuity is established, that For the forwards direction, going from functions to sets, letting denote the function, the corresponding set of functions is We must show that the result is closed, convex, and upper-complete.

For closure, if limits to , and all the are in the set (so for all and ) and is arbitrary, then due to limiting to , we have that and this argument works for all , so is in the indicated set.

For convexity, our job is to show that if and are in the set, are in the set too. Fix an arbitrary . And convexity is shown. For upper-completeness, assume that according to the usual ordering on functions. Then

Now, all that remains is to show that the two translation directions compose to identity in both directions.

For function-to-set-to-function being identity, a sufficient property for this to hold is that, for every and , there exists a continuous function s.t. and

Why? Well, every function that lies in the set has the property that . Therefore, regardless of , we have that Or, rephrasing And, \emph{if} we had the property that for every and , there exists a continuous function s.t. and , that would mean that regardless of , you could find functions that would lie in , and where would lie arbitrarily close to . This would get equality, that Or, rephrasing, Or, rephrasing, Ie, going from function to set to function again is identity.

As for the other direction, in order for set-to-function-to-set to be identity, the inclusion direction is pretty easy to prove outright. For any in a set of functions, it'll exceed the inf of that set of functions, and so it'll end up being in the set generated by that inf.

The inclusion direction is harder. We'd need to show that given any closed convex upper-complete set of continuous bounded functions, , any function which is outside of that set has some where . The reason that showing that helps is it establishes that cannot be in the set induced by the inf of everything in . Therefore, the set induced by the inf function can't be a strict superset of , because any in the former set and not the latter one can be used to generate a witness that it's not in the former set.

So, our two missing results to prove to establish the entire isomorphism theorem are:

1: Establishing that for every and , there exists a continuous function s.t. and .

2: Establishing that for any closed convex upper-complete set of continuous bounded functions, , any function which is outside of that set has some where .

For both of these, we'll use the Hahn-Banach theorem. For the first one, let the set (a subset of ) be , and the set be the set consisting of the single point . The set is clearly convex, compact, and nonempty. That leaves showing that is closed, convex, and nonempty.

What we'll do is prove our desired result outright if the set is empty. It's the region under the graph of , which can only be empty if , in which case you could take any function and subtract a sufficiently large constant from it to get a function whose expectation w.r.t was below the you named.

So, we may safely assume that the set is nonempty. Closure works because if limits to and limits to , we have that by upper-semicontinuity. And convexity works because due to concavity of .

We must show that the two sets aren't disjoint, which is easy because the single point has . Thus the Hahn-Banach hyperplane separation theorem may be used to find an continuous affine hyperplane with on the top side of it, and on the top side of it. All continuous affine functions can be decomposed as a continuous linear functional (which can be interpreted as expectation w.r.t a bounded continuous function ), plus a constant. So, there's some bounded continuous function and constant number s.t. (ie, is below the hyperplane induced by the affine function induced by ), and where (ie, the point consisting of lies above the hyperplane induced by the affine function induced by ).

Now, specializing to the case where , and observing that for probability distributions, we can take to be our function of interest, and get that and also, . So, we've found a function with the indicated properties, outright proving the first of our two unsolved problems.

This leaves the second one. The set is closed and convex, and is just a single point, so we can invoke the Hahn-Banach theorem to get a separating linear functional. Some signed measure where .

We need to show that this signed measure is a measure. If there was any negative region in the signed measure, then there could be a continuous function that acted as a massive spike in the negative region of , and (for very large constants and some ) would be arbitrarily negative, while would lie in by upper-closure, and we'd have a contradiction.

So, is a measure. Multiplying it by a suitable nonnegative constant turns it into a probability distribution that still fulfills the property that , and we're done.

Since our two missing part of the proof were cleaned up, we've proven the overall theorem.

Theorem 2: Ball Theorem

For any , , , and , the infrafunction corresponding to (Knightian uncertainty over the ball of size centered at , w.r.t ), is where . Further, given a function , the optimal to pick is the distribution for some constants and .

Proof: The Lp norm of a function w.r.t. will be denoted as . The function associated with the worst-case amongst the ball centered at is We can rewrite this as plus a function that has an norm of or less. So we get the equivalent function Now, we'll show that where , by showing that it's both and that quantity. For one inequality direction, we can go And then apply Holder's inequality. And so, we have one inequality direction, that For the other inequality direction, we can let be defined as

Our first order of business is showing that the norm of is or less. So, compute it. Distribute the power in. Pull the , group and cancel out. So, this is a valid choice of function . So, we can compute (by pulling the multiplicative constants out) Now, since We can reexpress our old equation as So we have Since that's the other inequality direction, we have equality. Now, since we were last at we can proceed to And that addresses one part of the theorem, about the objective function we're minimizing.

Sadly, the last part, about the optimal , can only be done according to a physics level of rigor. Given two points and which the optimal assigns probability to, the increase in the objective function from adding a tiny little of probability mass to must be the same as the increase from adding a tiny little of probability mass to . This is because, otherwise, if got a higher score than , you could steal a tiny amount of probability mass from , reallocate it to , and you'd get a higher score.

So, we can ask, what is for the optimal ? We know that it will end up being equal for every which is in the support of the optimal . Well, we can write it as And cancel Now, since is the same regardless of which was picked, we can abbreviate it as , and write things as As previously explained, this quantity must be the same for all that are supported by the optimal . In fact, this quantity can be called . So, for every with support, we have that We can rearrange this And now we see that, since the left-hand side is always nonnegative, must also be nonnegative too. And for any which fulfilled the above equation, it can be rearranged to show that this must have its partial derivative in the direction being the maximal value of , as expected, and so would not be harmed by placing any probability mass on that . So, without loss of generality, we can assume that is supported over all which fulfill the above equation, and make it into an equation that's valid for \emph{all} by going Rearrange. Now, is a constant, so call it . And since it's valid over all , we can rephrase it as And now, since , we can rearrange this into , and so And so our equation rearranges into As desired.

Theorem 3: Dynamic Consistency

For any environment , finite history , off-history policy , and infrafunctions and , we have that} \emph{Or, restating in words, selecting the after-h policy by argmaxing for makes an overall policy that outscores the overall policy made by selecting the after-h policy to argmax for .

Proof: Use to abbreviate the optimal policy for , and interacting with the environment . Now, since is optimal for and interacting with , that means that for all other post-h policies , we have that Unpacking the definition of the updated infrafunction, since everything in the set associated with came from the set associated with , this turns into Reshuffling some expectations, this turns into Let be the policy that optimizes . Now, we were previously at but using the above inequality, we can proceed to And we're done, we've shown our desired inequality And the , , and everything else was arbitrary, so it's universally valid.

Proposition 2: Double Integral Inequality

If , let be an abbreviation for . Then for all , and and and , we have that

Here's why. Define the function as . Reverse Holder's inequality, because , says that, for all , Where . So, using that inequality, we can derive that And so we derive that Pull the constant out, and group this into a double integral. Swap the order of double integration, and unpack what is. Pull the constant partway out of the inner integral, and also collapse the two powers on the left-hand side. Now, since , solving for , we get that it is equal to . Similarly, . Make those substitutions. Unpack what is. Simplify Cancel Rewrite And we're done! We've shown that For some values of , like 0 or 1 or , these quantities will be ill-defined, but this can be handled by taking limits of the where this argument works.

Corollary 1: Lp-Averaging is Well-Defined

Given any distribution over a family of functions or infrafunctions, define the -average of this family (for ) as the function If all infrafunctions are , then the -average will be an infrafunction as well.

Proof: The main thing we need to verify is concavity of the resulting function. So, fix some and , and use as our mixing parameter. Then, Now, this can actually be interpreted as an ordinary integral outside of a p-integral. The outer integral is over the space of two points, and and . So applying our inequality from the previous theorem, we get And then apply concavity of each . Concavity has been proved.

Theorem 4: Crappy Optimizer Theorem

For any selection process where fulfills the four properties, will hold for some closed convex set of probability distributions . Conversely, the function for any closed convex set will fulfill the four properties of an optimization process.

It's already a theorem that any function which fulfills the three properties of concavity, monotonicity, and , has a representation as minimizing over a closed convex set of probability distributions, and that minimizing over a closed convex set of probability distributions will produce a functional which fulfills those four properties. It's somewhere in Less Basic Inframeasure Theory. Well, actually, I proved some stuff about ultradistributions and all the proofs were the same just flipping concave to convex, and minimization to maximization, in an extremely straightforward way. And I proved that those three properties (but with convex flipped to concave) were equivalent to minimizing over a closed convex set of probability distributions, so the same result should hold.

Very similar arguments prove that any function which fulfills the three properties of convexity, monotonicity, and being able to pull multiplicative and additive constants out, has a representation as maximizing over a closed convex set of probability distributions , and vice-versa.

So, that just leaves proving the three defining properties of an ultradistribution from the four properties assumed of the function , and vice-versa. As a recap, the four properties assumed were And the four defining properties of ultradistributions are Proof of U1 from P2 and P3: Proof of U2 from P3 and P4 and Proof of U3 from P1 and P2: Now for the reverse direction!

Proof of P1 from U3: Proof of P2 from U3: Proof of P3 from U1 and U3: Proof of P4 from U2 and U3 and And done!

Proposition 3:

The space of infrafunctions is a -algebra, with the function being defined as .

We just need to show two commuting diagrams. For the square one, fix an arbitrary and . Unpack the outer layer of flattening. Unpack the next layer of flattening, via the usual way that flattening works for infradistributions. Regroup this back up in a different way with how works. Write this as the beta reduction of the more complicated term Write this as a pushforward along an infrakernel. Write this as a flattening. Now, since this holds for all , that means that And since this holds for all , this means that This demonstrates the first commutative diagram of an algebra of a monad, the one that looks like a square. Now for the forward-and-back digram composing to identity.

We need that the usual embedding given by