Infrafunctions Proofs

Proposition 1:

Given some compact metric space of options , if $U : X \to R$ is a bounded function, ${μ | \forall ν \in Δ X : μ (U) \geq ν (U)} = Δ {x | \forall y \in X : U (x) \geq U (y)}$

PROOF: To do this, we must show that anything in the first set is in the second, and vice-versa. So, assume $μ$ is in the first set. Then for any $ν$ , $ν (U) \leq μ (U)$ . However, if $μ$ is not in the second set, then it has probability-mass on some points $x$ where there's a $y$ s.t. $U (y) > U (x)$ . Moving that new probability mass to points of higher utility, we'd construct a new probability distribution $ν$ s.t. $ν (U) > μ (U)$ , which is impossible. So $μ$ must be in the second set.

In the reverse direction, any probability distribution $μ$ supported over the maximum of $U$ (let $x$ be an arbitrary point in it) has $μ (U) = U (x)$ . Thus, for any $ν$ , it's supported entirely over points with equal or lesser utility than this maximum value, so $μ (U) \geq ν (U)$ . Both subset inclusion directions have been proved, so the equality holds.

Theorem 1: Fundamental Theorem of Infrafunctions

There is a bijection between concave upper-semicontinuous functions of type $Δ X \to R \cup {- \infty}$ , and closed convex upper-complete sets of continuous functions $X \to R$ .

Proof: For the reverse direction, going from sets to functions, letting $^F$ denote the set and $F$ denote the function, we have that $F (μ) := inf f \in^F μ (f)$ With this definition being made, let's verify that the resulting function fulfills the indicated property. For verifying concavity, we have $F (p μ + (1 - p) ν) = inf f \in^F (p μ + (1 - p) ν) (f) \geq p inf f \in^F μ (f) + (1 - p) inf f \in^F ν (f) = p F (μ) + (1 - p) F (ν)$ For verifying upper-semicontinuity, let $μ_{n}$ limit to $μ$ . Now, for any particular function $f^{*}$ , due to $μ_{n}$ limiting to $μ$ , we have $limsup n \to \infty μ_{n} (f^{*}) = μ (f^{*})$ So, for any $f^{*}$ where $μ (f^{*})$ is an (arbitrarily close) approximation to ${inf}_{f \in^F} μ (f)$ , we can go $F (μ) = inf f \in^F μ (f) ≃ μ (f^{*}) = limsup n \to \infty μ_{n} (f^{*}) \geq limsup n \to \infty inf f \in^F μ_{n} (f) = limsup n \to \infty F (μ_{n})$ And so upper-semicontinuity is established, that $F (μ) \geq limsup n \to \infty F (μ_{n})$ For the forwards direction, going from functions to sets, letting $F$ denote the function, the corresponding set of functions is ${f | \forall μ \in Δ X : μ (f) \geq F (μ)}$ We must show that the result is closed, convex, and upper-complete.

For closure, if $f_{m}$ limits to $f$ , and all the $f_{m}$ are in the set (so $μ (f_{m}) \geq F (μ)$ for all $m$ and $μ$ ) and $μ$ is arbitrary, then due to $f_{m}$ limiting to $f$ , we have that $μ (f) = {lim}_{m \to \infty} μ (f_{m}) \geq F (μ)$ and this argument works for all $μ$ , so $f$ is in the indicated set.

For convexity, our job is to show that if $f$ and $g$ are in the set, $p f + (1 - p) g$ are in the set too. Fix an arbitrary $μ$ . $μ (p f + (1 - p) g) = p μ (f) + (1 - p) μ (g) \geq p F (μ) + (1 - p) F (μ) = F (μ)$ And convexity is shown. For upper-completeness, assume that $g \geq f$ according to the usual ordering on functions. Then $μ (g) \geq μ (f) \geq F (μ)$

Now, all that remains is to show that the two translation directions compose to identity in both directions.

For function-to-set-to-function being identity, a sufficient property for this to hold is that, for every $μ^{'} \in Δ X$ and $x > F (μ^{'})$ , there exists a continuous function $f$ s.t. $\forall μ \in Δ X : μ (f) \geq F (μ)$ and $x \geq μ^{'} (f)$

Why? Well, every function $f$ that lies in the set $^F$ has the property that $\forall μ : μ (f) \geq F (μ)$ . Therefore, regardless of $μ$ , we have that $\forall μ^{'} : inf f : \forall μ, μ (f) \geq F (μ) μ^{'} (f) \geq F (μ^{'})$ Or, rephrasing $\forall μ^{'} : inf f \in^F μ^{'} (f) \geq F (μ^{'})$ And, \emph{if} we had the property that for every $μ^{'} \in Δ X$ and $x > F (μ^{'})$ , there exists a continuous function $f$ s.t. $\forall μ \in Δ X : μ (f) \geq F (μ)$ and $x \geq μ^{'} (f)$ , that would mean that regardless of $μ^{'}$ , you could find functions $f$ that would lie in $^F$ , and where $μ^{'} (f)$ would lie arbitrarily close to $F (μ^{'})$ . This would get equality, that $\forall μ^{'} : inf f : \forall μ, μ (f) \geq F (μ) μ^{'} (f) = F (μ^{'})$ Or, rephrasing, $\forall μ^{'} : inf f \in^F μ^{'} (f) = F (μ^{'})$ Or, rephrasing, $inf f \in^F = F$ Ie, going from function to set to function again is identity.

As for the other direction, in order for set-to-function-to-set to be identity, the $\subseteq$ inclusion direction is pretty easy to prove outright. For any $f$ in a set of functions, it'll exceed the inf of that set of functions, and so it'll end up being in the set generated by that inf.

The $\supseteq$ inclusion direction is harder. We'd need to show that given any closed convex upper-complete set of continuous bounded functions, $^F$ , any function $g$ which is outside of that set has some $μ$ where $μ (g) < {inf}_{f \in^F} μ (f)$ . The reason that showing that helps is it establishes that $g$ cannot be in the set induced by the inf of everything in $^F$ . Therefore, the set induced by the inf function can't be a strict superset of $^F$ , because any $g$ in the former set and not the latter one can be used to generate a witness that it's not in the former set.

So, our two missing results to prove to establish the entire isomorphism theorem are:

1: Establishing that for every $μ^{'} \in Δ X$ and $x > F (μ^{'})$ , there exists a continuous function $f$ s.t. $\forall μ \in Δ X : μ (f) \geq F (μ)$ and $x \geq μ^{'} (f)$ .

2: Establishing that for any closed convex upper-complete set of continuous bounded functions, $^F$ , any function $g$ which is outside of that set has some $μ$ where $μ (g) < {inf}_{f \in^F} μ (f)$ .

For both of these, we'll use the Hahn-Banach theorem. For the first one, let the set $A$ (a subset of $M^{\pm} (X) \oplus R$ ) be ${y, μ | y \leq F (μ)}$ , and the set $B$ be the set consisting of the single point $x, μ^{'}$ . The set $B$ is clearly convex, compact, and nonempty. That leaves showing that $A$ is closed, convex, and nonempty.

What we'll do is prove our desired result outright if the set $A$ is empty. It's the region under the graph of $F$ , which can only be empty if $F = - \infty$ , in which case you could take any function $f$ and subtract a sufficiently large constant from it to get a function whose expectation w.r.t $μ$ was below the $x$ you named.

So, we may safely assume that the set $A$ is nonempty. Closure works because if $y_{n}$ limits to $y$ and $μ_{n}$ limits to $μ$ , we have that $y = lim n \to \infty y_{n} \leq limsup n \to \infty F (μ_{n}) \leq F (μ)$ by upper-semicontinuity. And convexity works because $p y + (1 - p) y^{'} \leq p F (μ) + (1 - p) F (μ^{'}) \leq F (p μ + (1 - p) μ^{'})$ due to concavity of $F$ .

We must show that the two sets aren't disjoint, which is easy because the single point has $x > F (μ^{'})$ . Thus the Hahn-Banach hyperplane separation theorem may be used to find an continuous affine hyperplane with $B$ on the top side of it, and $A$ on the top side of it. All continuous affine functions $M^{\pm} (X) \to R$ can be decomposed as a continuous linear functional $M^{\pm} (X) \to R$ (which can be interpreted as expectation w.r.t a bounded continuous function $f$ ), plus a constant. So, there's some bounded continuous function $f$ and constant number $b$ s.t. $\forall μ, y \in A : y \leq μ (f) + b$ (ie, $A$ is below the hyperplane induced by the affine function induced by $f, b$ ), and where $μ^{'} (f) + b \leq x$ (ie, the point $B$ consisting of $μ^{'}, x$ lies above the hyperplane induced by the affine function induced by $f, b$ ).

Now, specializing to the case where $y = F (μ)$ , and observing that $μ (f + b) = μ (f) + b$ for probability distributions, we can take $f + b$ to be our function of interest, and get that $\forall μ \in Δ X : μ (f + b) = μ (f) + b \geq F (μ)$ and also, $x \geq μ^{'} (f) + b = μ^{'} (f + b)$ . So, we've found a function with the indicated properties, outright proving the first of our two unsolved problems.

This leaves the second one. The set $^F$ is closed and convex, and $g$ is just a single point, so we can invoke the Hahn-Banach theorem to get a separating linear functional. Some signed measure $m$ where $m (g) < {inf}_{f \in^F} m (f)$ .

We need to show that this signed measure is a measure. If there was any negative region in the signed measure, then there could be a continuous function $g \geq 0$ that acted as a massive spike in the negative region of $m$ , and $m (f + a g)$ (for very large constants $a$ and some $f \in^F$ ) would be arbitrarily negative, while $f + a g$ would lie in $^F$ by upper-closure, and we'd have a contradiction.

So, $m$ is a measure. Multiplying it by a suitable nonnegative constant turns it into a probability distribution $μ$ that still fulfills the property that $μ (g) < {inf}_{f \in^F} μ (f)$ , and we're done.

Since our two missing part of the proof were cleaned up, we've proven the overall theorem.

Theorem 2: $L_{p}$ Ball Theorem

For any $ϵ > 0$ , $p \in [1, \infty]$ , $f : X \to R$ , and $ν : Δ X$ , the infrafunction corresponding to ${g | {(\int | f - g |^{p} d ν)}^{\frac{1}{p}} \leq ϵ}$ (Knightian uncertainty over the $L_{p}$ ball of size $ϵ$ centered at $f$ , w.r.t $ν$ ), is $μ \mapsto μ (f) - ϵ | | \frac{d μ}{d ν} | |_{q}$ where $\frac{1}{p} + \frac{1}{q} = 1$ . Further, given a function $f$ , the optimal $μ$ to pick is the distribution $a \cdot ν \cdot max (f - b, 0)^{p - 1}$ for some constants $a > 0$ and $b \leq max (f)$ .

Proof: The Lp norm of a function w.r.t. $ν$ will be denoted as $| | g | |_{p}$ . The function associated with the worst-case amongst the $L_{p}$ ball centered at $f$ is $λ μ . inf g : | | f - g | |_{p} \leq ϵ μ (g)$ We can rewrite this as $f$ plus a function $h$ that has an $L_{p}$ norm of $ϵ$ or less. So we get the equivalent function $= λ μ . inf h : | | h | |_{p} \leq ϵ μ (f + h)$ $= λ μ . inf h : | | h | |_{p} \leq ϵ μ (f) + μ (h)$ $= λ μ . μ (f) + inf h : | | h | |_{p} \leq ϵ μ (h)$ $= λ μ . μ (f) - sup h : | | h | |_{p} \leq ϵ μ (h)$ Now, we'll show that ${sup}_{h : | | h | |_{p} \leq ϵ} μ (h) = | | \frac{d μ}{d ν} | |_{q}$ where $\frac{1}{p} + \frac{1}{q} = 1$ , by showing that it's both $\leq$ and $\geq$ that quantity. For one inequality direction, we can go $sup h : | | h | |_{p} \leq ϵ μ (h) = sup h : | | h | |_{p} \leq ϵ \int h d μ = sup h : | | h | |_{p} \leq ϵ \int h \frac{d μ}{d ν} d ν$ And then apply Holder's inequality. $\leq sup h : | | h | |_{p} \leq ϵ | | h | |_{p} \cdot | | \frac{d μ}{d ν} | |_{q} = ϵ | | \frac{d μ}{d ν} | |_{q}$ And so, we have one inequality direction, that $sup h : | | h | |_{p} \leq ϵ μ (h) \leq ϵ | | \frac{d μ}{d ν} | |_{q}$ For the other inequality direction, we can let $h$ be defined as $ϵ \cdot \frac{{(\frac{d μ}{d ν})}^{\frac{1}{p - 1}}}{{⎛ ⎜ ⎝ \int {(\frac{d μ}{d ν})}^{\frac{p}{p - 1}} d ν ⎞ ⎟ ⎠}^{\frac{1}{p}}}$

Our first order of business is showing that the $L_{p}$ norm of $h$ is $ϵ$ or less. So, compute it. ${⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ \int {⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ϵ \cdot \frac{{(\frac{d μ}{d ν})}^{\frac{1}{p - 1}}}{{(\int {(\frac{d μ}{d ν})}^{\frac{p}{p - 1}} d ν)}^{\frac{1}{p}}} ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠}^{p} d ν ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠}^{\frac{1}{p}}$ Distribute the power in. $= {⎛ ⎜ ⎜ ⎜ ⎝ \int ϵ^{p} \cdot \frac{{(\frac{d μ}{d ν})}^{\frac{p}{p - 1}}}{\int {(\frac{d μ}{d ν})}^{\frac{p}{p - 1}} d ν} d ν ⎞ ⎟ ⎟ ⎟ ⎠}^{\frac{1}{p}}$ Pull the $ϵ^{p}$ , group and cancel out. $= {(ϵ^{p})}^{\frac{1}{p}} = ϵ$ So, this is a valid choice of function $h$ . So, we can compute (by pulling the multiplicative constants out) $sup h : | | h | |_{p} \leq ϵ μ (h) \geq μ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ϵ \cdot \frac{{(\frac{d μ}{d ν})}^{\frac{1}{p - 1}}}{{(\int {(\frac{d μ}{d ν})}^{\frac{p}{p - 1}} d ν)}^{\frac{1}{p}}} ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ = ϵ \cdot {⎛ ⎝ \int {(\frac{d μ}{d ν})}^{\frac{p}{p - 1}} d ν ⎞ ⎠}^{\frac{- 1}{p}} \int {(\frac{d μ}{d ν})}^{\frac{1}{p - 1}} d μ$ Now, since $\int {(\frac{d μ}{d ν})}^{\frac{1}{p - 1}} d μ = \int {(\frac{d μ}{d ν})}^{\frac{1}{p - 1}} \frac{d μ}{d ν} d ν = \int {(\frac{d μ}{d ν})}^{\frac{p}{p - 1}} d ν$ We can reexpress our old equation as $= ϵ \cdot {⎛ ⎝ \int {(\frac{d μ}{d ν})}^{\frac{p}{p - 1}} d ν ⎞ ⎠}^{\frac{- 1}{p}} {⎛ ⎝ \int {(\frac{d μ}{d ν})}^{\frac{p}{p - 1}} d ν ⎞ ⎠}^{\frac{p}{p}} = ϵ \cdot {⎛ ⎝ \int {(\frac{d μ}{d ν})}^{\frac{p}{p - 1}} d ν ⎞ ⎠}^{\frac{p - 1}{p}}$ $= ϵ \cdot {(\int {(\frac{d μ}{d ν})}^{q} d ν)}^{\frac{1}{q}} = ϵ | | \frac{d μ}{d ν} | |_{q}$ So we have $sup h : | | h | |_{p} \leq ϵ μ (h) \geq ϵ | | \frac{d μ}{d ν} | |_{q}$ Since that's the other inequality direction, we have equality. Now, since we were last at $= λ μ . μ (f) - sup h : | | h | |_{p} \leq ϵ μ (h)$ we can proceed to $= λ μ . μ (f) - ϵ | | \frac{d μ}{d ν} | |_{q}$ And that addresses one part of the theorem, about the objective function we're minimizing.

Sadly, the last part, about the optimal $μ$ , can only be done according to a physics level of rigor. Given two points $x$ and $y$ which the optimal $μ$ assigns probability to, the increase in the objective function from adding a tiny little $δ$ of probability mass to $x$ must be the same as the increase from adding a tiny little $δ$ of probability mass to $y$ . This is because, otherwise, if $x$ got a higher score than $y$ , you could steal a tiny amount of probability mass from $y$ , reallocate it to $x$ , and you'd get a higher score.

So, we can ask, what is $\frac{\partial}{\partial x} (μ \mapsto μ (f) - ϵ | | \frac{d μ}{d ν} | |_{q})$ for the optimal $μ$ ? We know that it will end up being equal for every $x$ which is in the support of the optimal $μ$ . Well, we can write it as $\frac{\partial}{\partial x} (μ (f) - ϵ | | \frac{d μ}{d ν} | |_{q})$ $= \frac{\partial}{\partial x} ⎛ ⎝ \int f d μ - ϵ {(\int {(\frac{d μ}{d ν})}^{q} d ν)}^{\frac{1}{q}} ⎞ ⎠$ $= \frac{\partial}{\partial x} (\int f d μ) - \frac{\partial}{\partial x} ⎛ ⎝ ϵ {(\int {(\frac{d μ}{d ν})}^{q} d ν)}^{\frac{1}{q}} ⎞ ⎠$ $= f (x) - ϵ \cdot \frac{1}{q} {(\int {(\frac{d μ}{d ν})}^{q} d ν)}^{\frac{1}{q} - 1} \frac{\partial}{\partial x} (\int {(\frac{d μ}{d ν})}^{q} d ν)$ $= f (x) - ϵ \cdot \frac{1}{q} {(\int {(\frac{d μ}{d ν})}^{q} d ν)}^{\frac{1}{q} - 1} ν (x) \frac{\partial}{\partial x} ({(\frac{d μ}{d ν})}^{q})$ $= f (x) - ϵ \cdot \frac{1}{q} {(\int {(\frac{d μ}{d ν})}^{q} d ν)}^{\frac{1}{q} - 1} ν (x) \cdot \frac{1}{ν (x)^{q}} \frac{\partial}{\partial x} (d μ)^{q}$ $= f (x) - ϵ \cdot \frac{1}{q} {(\int {(\frac{d μ}{d ν})}^{q} d ν)}^{\frac{1}{q} - 1} ν (x) \cdot \frac{1}{ν (x)^{q}} \cdot q \cdot μ (x)^{q - 1}$ And cancel $= f (x) - ϵ {(\int {(\frac{d μ}{d ν})}^{q} d ν)}^{\frac{1}{q} - 1} {(\frac{d μ}{d ν} (x))}^{q - 1}$ Now, since $ϵ {(\int {(\frac{d μ}{d ν})}^{q} d ν)}^{\frac{1}{q} - 1}$ is the same regardless of which $x$ was picked, we can abbreviate it as $α$ , and write things as $\frac{\partial}{\partial x} (μ (f) - ϵ | | \frac{d μ}{d ν} | |_{q}) = f (x) - α {(\frac{d μ}{d ν} (x))}^{q - 1}$ As previously explained, this quantity must be the same for all $x$ that are supported by the optimal $μ$ . In fact, this quantity can be called $b$ . So, for every $x$ with support, we have that $b = f (x) - α {(\frac{d μ}{d ν} (x))}^{q - 1}$ We can rearrange this $α {(\frac{d μ}{d ν} (x))}^{q - 1} = f (x) - b$ ${(\frac{d μ}{d ν} (x))}^{q - 1} = \frac{f (x) - b}{α} (f (x) - b)$ $\frac{d μ}{d ν} (x) = {(\frac{f (x) - b}{α})}^{\frac{1}{q - 1}}$ And now we see that, since the left-hand side is always nonnegative, $f (x) - b$ must also be nonnegative too. And for any $x$ which fulfilled the above equation, it can be rearranged to show that this $x$ must have its partial derivative in the $x$ direction being the maximal value of $b$ , as expected, and so $μ$ would not be harmed by placing any probability mass on that $x$ . So, without loss of generality, we can assume that $μ$ is supported over all $x$ which fulfill the above equation, and make it into an equation that's valid for \emph{all} $x$ by going $\frac{d μ}{d ν} (x) = {(\frac{max (f (x) - b, 0)}{α})}^{\frac{1}{q - 1}}$ Rearrange. $μ (x) = ν (x) \cdot {(\frac{max (f (x) - b, 0)}{α})}^{\frac{1}{q - 1}}$ Now, ${(\frac{1}{α})}^{\frac{1}{q - 1}}$ is a constant, so call it $a$ . And since it's valid over all $x$ , we can rephrase it as $μ = a \cdot ν \cdot max (f - b, 0)^{\frac{1}{q - 1}}$ And now, since $\frac{1}{p} + \frac{1}{q} = 1$ , we can rearrange this into $q = \frac{p}{p - 1}$ , and so $\frac{1}{q - 1} = \frac{1}{\frac{p}{p - 1} - \frac{p - 1}{p - 1}} = \frac{1}{\frac{1}{p - 1}} = p - 1$ And so our equation rearranges into $μ = a \cdot ν \cdot max (f - b, 0)^{p - 1}$ As desired.

Theorem 3: Dynamic Consistency

For any environment $e$ , finite history $h$ , off-history policy $π_{\neg h}$ , and infrafunctions $U$ and $V$ , we have that} $U ((π_{\neg h} ∙ {argmax}_{π^{*}} (U | e, h, π_{\neg h}) (π^{*} \cdot (e | h))) \cdot e) \geq U ((π_{\neg h} ∙ {argmax}_{π^{*}} V (π^{*} \cdot (e | h))) \cdot e)$ \emph{Or, restating in words, selecting the after-h policy by argmaxing for $U | e, h, π_{\neg h}$ makes an overall policy that outscores the overall policy made by selecting the after-h policy to argmax for $V$ .

Proof: $U ((π_{\neg h} ∙ {argmax}_{π^{*}} (U | e, h, π_{\neg h}) (π^{*} \cdot (e | h))) \cdot e)$ Use $π^{*}$ to abbreviate the optimal policy for $U | e, h, π_{\neg h}$ , and interacting with the environment $e | h$ . $= U ((π_{\neg h} ∙ π^{*}) \cdot e)$ $= min U \in^U E_{(π_{\neg h} ∙ π^{*}) \cdot e} [U]$ $= min U \in^U E_{(π_{\neg h} ∙ π^{*}) \cdot e} [1_{\neg h} U] + E_{(π_{\neg h} ∙ π^{*}) \cdot e} [1_{h} U]$ $= min U \in^U E_{π_{\neg h} \cdot e} [1_{\neg h} U] + E_{π_{\neg h} \cdot e} [1_{h}] \cdot E_{π^{*} \cdot (e | h)} [U_{↓ h}]$ Now, since $π^{*}$ is optimal for $U | e, h, π_{\neg h}$ and interacting with $e | h$ , that means that for all other post-h policies $π^{'}$ , we have that $(U | e, h, π_{\neg h}) (π^{*} \cdot (e | h)) \geq (U | e, h, π_{\neg h}) (π^{'} \cdot (e | h))$ Unpacking the definition of the updated infrafunction, since everything in the set associated with $U | e, h, π_{\neg h}$ came from the set associated with $U$ , this turns into $min U \in^U E_{π^{*} \cdot (e | h)} [E_{π_{\neg h} \cdot e} [1_{h}] \cdot U_{↓ h} + E_{π_{\neg h} \cdot e} [1_{\neg h} U]] \geq min U \in^U E_{π^{'} \cdot (e | h)} [E_{π_{\neg h} \cdot e} [1_{h}] \cdot U_{↓ h} + E_{π_{\neg h} \cdot e} [1_{\neg h} U]]$ Reshuffling some expectations, this turns into $min U \in^U E_{π_{\neg h} \cdot e} [1_{\neg h} U] + E_{π_{\neg h} \cdot e} [1_{h}] E_{π^{*} \cdot (e | h)} [U_{↓ h}] \geq min U \in^U E_{π_{\neg h} \cdot e} [1_{\neg h} U] + E_{π_{\neg h} \cdot e} [1_{h}] E_{π^{'} \cdot (e | h)} [U_{↓ h}]$ Let $π^{'}$ be the policy that optimizes $V (π^{'} \cdot (e | h))$ . Now, we were previously at $= min U \in^U E_{π_{\neg h} \cdot e} [1_{\neg h} U] + E_{π_{\neg h} \cdot e} [1_{h}] \cdot E_{π^{*} \cdot (e | h)} [U_{↓ h}]$ but using the above inequality, we can proceed to $\geq min U \in^U E_{π_{\neg h} \cdot e} [1_{\neg h} U] + E_{π_{\neg h} \cdot e} [1_{h}] \cdot E_{π^{'} \cdot (e | h)} [U_{↓ h}]$ $= min U \in^U E_{(π_{\neg h} ∙ π^{'}) \cdot e} [1_{\neg h} U] + E_{(π_{\neg h} ∙ π^{'}) \cdot e} [1_{h} U]$ $= min U \in^U E_{(π_{\neg h} ∙ π^{'}) \cdot e} [U]$ $= U ((π_{\neg h} ∙ π^{'}) \cdot e)$ $= U ((π_{\neg h} ∙ {argmax}_{π^{'}} V (π^{'} \cdot (e | h))) \cdot e)$ And we're done, we've shown our desired inequality $U ((π_{\neg h} ∙ {argmax}_{π^{*}} (U | e, h, π_{\neg h}) (π^{*} \cdot (e | h))) \cdot e) \geq U ((π_{\neg h} ∙ {argmax}_{π^{*}} V (π^{*} \cdot (e | h))) \cdot e)$ And the $U$ , $V$ , and everything else was arbitrary, so it's universally valid.

Proposition 2: $L_{p}$ Double Integral Inequality

If $f \geq 0$ , let $\oint^{p} f d μ$ be an abbreviation for ${(\int f^{p} d μ)}^{\frac{1}{p}}$ . Then for all $p \in [- \infty, 1]$ , and $μ : Δ X$ and $ν : Δ Y$ and $f : X \times Y \to R^{\geq 0}$ , we have that $\int \oint^{p} f (x, y) d ν d μ \leq \oint^{p} \int f (x, y) d μ d ν$

Here's why. Define the function $g : Y \to R^{\geq 0}$ as $g (y) = {(\int f (x, y) d μ)}^{p - 1}$ . Reverse Holder's inequality, because $p \leq 1$ , says that, for all $x$ , $\int f (x, y) g (y) d ν \geq \oint^{p} f (x, y) d ν \cdot \oint^{q} g (y) d ν$ Where $\frac{1}{p} + \frac{1}{q} = 1$ . So, using that inequality, we can derive that $\frac{\int f (x, y) g (y) d ν}{\oint^{q} g (y) d ν} \geq \oint^{p} f (x, y) d ν$ And so we derive that $\int \oint^{p} f (x, y) d ν d μ \leq \int \frac{\int f (x, y) g (y) d ν}{\oint^{q} g (y) d ν} d μ$ Pull the constant out, and group this into a double integral. $= {(\oint^{q} g (y) d ν)}^{- 1} \cdot \int \int f (x, y) g (y) d ν d μ$ Swap the order of double integration, and unpack what $\oint^{q}$ is. $= {⎛ ⎝ {(\int g (y)^{q} d ν)}^{\frac{1}{q}} ⎞ ⎠}^{- 1} \cdot \int \int f (x, y) g (y) d μ d ν$ Pull the constant $g (y)$ partway out of the inner integral, and also collapse the two powers on the left-hand side. $= {(\int g (y)^{q} d ν)}^{\frac{- 1}{q}} \cdot \int g (y) \int f (x, y) d μ d ν$ Now, since $\frac{1}{q} + \frac{1}{p} = 1$ , solving for $q$ , we get that it is equal to $\frac{p}{p - 1}$ . Similarly, $\frac{- 1}{q} = \frac{1}{p} - 1$ . Make those substitutions. $= {(\int g (y)^{\frac{p}{p - 1}} d ν)}^{\frac{1}{p} - 1} \cdot \int g (y) \int f (x, y) d μ d ν$ Unpack what $g (y)$ is. $= {⎛ ⎜ ⎝ \int {({(\int f (x, y) d μ)}^{p - 1})}^{\frac{p}{p - 1}} d ν ⎞ ⎟ ⎠}^{\frac{1}{p} - 1} \cdot \int {(\int f (x, y) d μ)}^{p - 1} (\int f (x, y) d μ) d ν$ Simplify $= {(\int {(\int f (x, y) d μ)}^{p} d ν)}^{\frac{1}{p} - 1} \cdot \int {(\int f (x, y) d μ)}^{p} d ν$ Cancel $= {(\int {(\int f (x, y) d μ)}^{p} d ν)}^{\frac{1}{p}}$ Rewrite $= \oint^{p} \int f (x, y) d μ d ν$ And we're done! We've shown that $\int \oint^{p} f (x, y) d ν d μ \leq \oint^{p} \int f (x, y) d μ d ν$ For some values of $p$ , like 0 or 1 or $- \infty$ , these quantities will be ill-defined, but this can be handled by taking limits of the $p$ where this argument works.

Corollary 1: Lp-Averaging is Well-Defined

Given any distribution $ν$ over a family of functions or infrafunctions, define the $L_{p}$ -average of this family (for $p \in [- \infty, 1]$ ) as the function $μ \mapsto \oint^{p} F_{i} (μ) d ν$ If all infrafunctions $F_{i} (μ)$ are $\geq 0$ , then the $L_{p}$ -average will be an infrafunction as well.

Proof: The main thing we need to verify is concavity of the resulting function. So, fix some $μ_{1}$ and $μ_{2}$ , and use $q$ as our mixing parameter. Then, $q (\oint^{p} F_{i} d ν) (μ_{1}) + (1 - q) (\oint^{p} F_{i} d ν) (μ_{2})$ $= q \oint^{p} F_{i} (μ_{1}) d ν + (1 - q) \oint^{p} F_{i} (μ_{2}) d ν$ Now, this can actually be interpreted as an ordinary integral outside of a p-integral. The outer integral is over the space of two points, and $f (1, F) = F (μ_{1})$ and $f (2, F) = F (μ_{2})$ . So applying our inequality from the previous theorem, we get $\leq \oint^{p} q F_{i} (μ_{1}) + (1 - q) F_{i} (μ_{2}) d ν$ And then apply concavity of each $F$ . $\leq \oint^{p} F_{i} (q μ_{1} + (1 - q) μ_{2}) d ν$ Concavity has been proved.

Theorem 4: Crappy Optimizer Theorem

For any selection process $s$ where $Q (s)$ fulfills the four properties, $\forall f : Q (s) (f) = {max}_{μ \in Ψ} μ (f)$ will hold for some closed convex set of probability distributions $Ψ$ . Conversely, the function $f \mapsto {max}_{μ \in Ψ} μ (f)$ for any closed convex set $Ψ$ will fulfill the four properties of an optimization process.

It's already a theorem that any function $ψ : (X \to R) \to R$ which fulfills the three properties of concavity, monotonicity, and $ψ (a f + c) = a ψ (f) + c$ , has a representation as minimizing over a closed convex set of probability distributions, and that minimizing over a closed convex set of probability distributions will produce a functional which fulfills those four properties. It's somewhere in Less Basic Inframeasure Theory. Well, actually, I proved some stuff about ultradistributions and all the proofs were the same just flipping concave to convex, and minimization to maximization, in an extremely straightforward way. And I proved that those three properties (but with convex flipped to concave) were equivalent to minimizing over a closed convex set of probability distributions, so the same result should hold.

Very similar arguments prove that any function which fulfills the three properties of convexity, monotonicity, and being able to pull multiplicative and additive constants out, has a representation as maximizing over a closed convex set of probability distributions $Ψ$ , and vice-versa.

So, that just leaves proving the three defining properties of an ultradistribution from the four properties assumed of the function $Q (s)$ , and vice-versa. As a recap, the four properties assumed were $P 1 : \forall f, c \in R : Q (s) (f + c) = Q (s) (f) + c$ $P 2 : \forall f, a \geq 0 : Q (s) (a f) = a Q (s) (f)$ $P 3 : \forall f, g : Q (s) (f + g) \leq Q (s) (f) + Q (s) (g)$ $P 4 : \forall f : f \leq 0 \to Q (s) (f) \leq 0$ And the four defining properties of ultradistributions are $U 1 : \forall f, g, p \in [0, 1] : ψ (p f + (1 - p) g) \leq p ψ (f) + (1 - p) ψ (g)$ $U 2 : \forall f, g : f \leq g \to ψ (f) \leq ψ (g)$ $U 3 : \forall f, c \in R, a \geq 0 : ψ (a f + c) = a ψ (f) + c$ Proof of U1 from P2 and P3: $Q (s) (p f + (1 - p) g) \leq Q (s) (p f) + Q (s) ((1 - p) g) = p Q (s) (f) + (1 - p) Q (s) (g)$ Proof of U2 from P3 and P4 and $f \leq g$ $Q (s) (f) = Q (s) ((f - g) + g) \leq Q (s) (f - g) + Q (s) (g) \leq Q (s) (g)$ Proof of U3 from P1 and P2: $Q (s) (a f + c) = Q (s) (a f) + c = a Q (s) (f) + c$ Now for the reverse direction!

Proof of P1 from U3: $ψ (f + c) = ψ (f) + c$ Proof of P2 from U3: $ψ (a f) = a ψ (f)$ Proof of P3 from U1 and U3: $ψ (f + g) = ψ (p \frac{1}{p} f + (1 - p) \frac{1}{1 - p} g) \leq p ψ (\frac{1}{p} f) + (1 - p) ψ (\frac{1}{1 - p} g) = ψ (f) + ψ (g)$ Proof of P4 from U2 and U3 and $f \leq 0$ $ψ (f) \leq ψ (0) = ψ (0 \cdot 0) = 0 \cdot ψ (0) = 0$ And done!

Proposition 3:

The space of infrafunctions is a $□$ -algebra, with the function $f l a t_{F X} : □ F X \to F X$ being defined as $λ ψ . λ μ . ψ (λ F . F (μ))$ .

We just need to show two commuting diagrams. For the square one, fix an arbitrary $Ψ : □ □ F X$ and $μ \in Δ X$ . $f l a t_{F X} (f l a t_{□ F X} (Ψ)) (μ)$ Unpack the outer layer of flattening. $= f l a t_{□ F X} (Ψ) (λ F . F (μ))$ Unpack the next layer of flattening, via the usual way that flattening works for infradistributions. $= Ψ (λ ψ . ψ (λ F . F (μ)))$ Regroup this back up in a different way with how $f l a t_{F X}$ works. $= Ψ (λ ψ . f l a t_{F X} (ψ) (μ))$ Write this as the beta reduction of the more complicated term $= Ψ (λ ψ . (λ F . F (μ)) (f l a t_{F X} (ψ)))$ Write this as a pushforward along an infrakernel. $= f l a t_{F X *} (Ψ) (λ F . F (μ))$ Write this as a flattening. $= f l a t F X (f l a t_{F X *} (Ψ)) (μ)$ Now, since this holds for all $μ$ , that means that $f l a t_{F X} (f l a t_{□ F X} (Ψ)) = f l a t_{F X} (f l a t_{F X *} (Ψ))$ And since this holds for all $Ψ$ , this means that $f l a t_{F X} \circ f l a t_{□ F X} = f l a t_{F X} \circ f l a t_{F X *}$ This demonstrates the first commutative diagram of an algebra of a monad, the one that looks like a square. Now for the forward-and-back digram composing to identity.

We need that the usual embedding $e : F X \to □ F X$ given by $e (F) = λ g . g (F)$ where $g : F X \to R$ , has $f l a t_{F X} \circ e = i d_{F X}$ . Let's compute it. Fix an arbitrary $F : F X$ and a $μ : Δ X$ . Then $f l a t_{F X} (e (F)) (μ)$ $= f l a t_{F X} (λ g . g (F)) (μ)$ $= (λ g . g (F)) (λ G . G (μ))$ $= (λ G . G (μ)) (F)$ $= F (μ) = i d_{F X} (F) (μ)$ And this holds for all $μ$ and $F$ , so composing flattening and the usual embedding produces the identity, which is the second property to declare something a $□$ -algebra.

Proposition 4:

All continuous infrakernels $k : X \to □ Y$ induce a function $F Y \to F X$ via $G \mapsto (μ \mapsto G (k_{*} (μ))$

We've got two things to verify. Namely, concavity, and upper-semicontinuity, of $λ μ . G (k_{*} (μ))$ . Upper-semicontinuity is easy, because if $μ_{n}$ converges to $μ$ , then $k_{*} (μ_{n})$ converges to $k_{*} (μ)$ , and then upper-semicontinuity of $G$ takes over. For concavity, we have $G (k_{*} (p μ_{1} + (1 - p) μ_{2})) = G (p k_{*} (μ_{1}) + (1 - p) k_{*} (μ_{2})) \geq p G (k_{*} (μ_{1})) + (1 - p) G (k_{*} (μ_{2}))$ And it's proved.

LESSWRONG
LW

LESSWRONG
LW

12

Infrafunctions Proofs

12

12