Infra-Exercises, Part 1

Diffractor; Jack Parker; Connall Garrod

There have been a few remarks about how my and Vanessa’s posts on Infra-Bayesianism seem interesting, but that it's quite a formidable body of work to chew through.

So anyone interested in learning Inframeasure theory to the level of proficiency where they are actually able to develop their own proofs about it and casually deploy it as a tool on other problems is probably not best served by the existing posts. You can only go so far by reading about something without ever applying it.

And so, to help people get a taste for the underlying math, I have constructed a sheet of exercises analogous to Scott's exercise sheets about fixpoints, to guide the reader through inventing most of the basic theory for themselves.

Said problem sheet was dramatically improved by Jack Parker, Connall Garrod, and Thomas Larsen as part of a SERI project, by including proof sketches, relevant definitions, and commentary, and generally making the whole exercise sheet dramatically more polished than it would otherwise be. Several others also contributed to testing out the exercises and providing solutions and feedback, most prominently, Viktoria Malyasova.

This first sheet focuses on introducing Inframeasures, and working through the Fundamental Theorem of Inframeasures. Admittedly, this is a bit distant from much of the later work (further problem sheets are upcoming), but it is a very important result to have a solid understanding of.

There is more information on how to approach the problems in the introduction of the document. We may potentially release more problem sheets in the future, depending on what the feedback to this one looks like. And if the polished versions are not created, I will publish the unpolished ones.

The link to the problems is right here.

2.3: I claim that the definition "m is the infimum of S := ∀x:x≤m⇔x≤S" is better: It should make for prettier proofs, and expresses "m eliminates ∀s∈S for purposes of testing ≤".

Have this marvelously pretty proof of 3b:
Theorem: {monotone, 1-Lipschitz, 0-increasing, concave} $: p Φ ⟹ p inf Φ$
Proof:

$ϕ$ is monotone iff $\forall x \leq y : ϕ x \leq ϕ y$ . $ϕ$ is 1-Lipschitz iff $\forall x, y : ϕ y - | y - x | \leq ϕ x$ . $ϕ$ is 0-increasing iff $0 \leq ϕ 0$ . $ϕ$ is concave iff $\forall C$ convex combination: $Σ ϕ C \leq ϕ Σ C$ . It thus suffices to show that for enough $α, β : α Φ \leq β Φ ⟹ α inf Φ \leq β inf Φ$ .

We'd like to say $α inf Φ \leq inf α Φ \leq inf β Φ \leq β inf Φ$ . The first needs precisely that $α$ be monotone, which all of the above are. The second is $inf$ being monotone at the premise. The third is equality when $β$ is $ϕ \mapsto ϕ x$ (which all of the above are), characterizing the very definition of $\leq$ on functions. $□$

Open: Does 3a characterize affine? Can the full set of p for which this goes through be succinctly described?

5c:

I had rejected this answer as unfair. This sort of trolling should be overseen personally.

In section 1: When the 'affine' word is commonly used, the mixture coeffecient (denoted there by ) can be any real number. When $0 \leq p \leq 1$ , then it's still an affine combination, but a more precise tern could be 'convex combination'.

However, as you work with function spaces, convexity in a function space is different then being a convex function, so maybe some new notation should be introduced.

pg 6 "there exist" -> "there exists"

pg 13 maybe specify that you mean a linear functional that cannot be written as an integral (I quickly jumped ahead after thinking of one where you don't need to take any integrals to evaluate it)

I've posted complete, Moore-method-style solutions over on my blog. Spoilers, natch. Please nitpick them to death.

Video link in the pdf doesn't work

Hi Max! That link will start working by the end of this weekend. Connall and I are putting the final touches on the video series, and once we're done the link will go live.

Nitpicks: Def 2.4 builds on the obvious generalisation of Def 2.3, not Def 2.3 itself. Also, Def. 2.3 is a little poorly worded, and should make clear that "~~not all sets~~ ^[1]all sets of reals which are bounded below have an infimum". I think it would be helpful for those with no background in proofs if you had an exercise to justify the use of "the" when talking about the supremum of a set. Or even link it to the idea of a lower bound at all.

More to come as I read through this.

EDIT: I think you probably want to point people to guides on how to prove things so that people without maths backgrounds know what task they're meant to be accomplishing. Otherwise they might go through this text and be hopelessly stuck or think they've proven something when they really haven't. Plus, having the ability to notice the difference between a proof and the intuition behind a proof is quite important for doing mathsy stuff, and Vanessa's agenda is amongst the more mathy alignment agendas.

^{^}
Ignore the struck through text.

not all sets all sets of reals which are bounded below have an infimum

Do you mean 'all sets of reals which are bounded below DO have an infimum'?

Problem 1:

The strategy here will be to show that g(x) = f(x) - f(0) is linear. First, f(px) = f(px + (1-p)0) = pf(x) + f(0) - pf(0) so f(px) - f(0) = p(f(x) - f(0)), so we know we can pull out constants between 0 and 1. Now if we had c > 1, we know that 1/c < 1, so f(1/c cx + (1 - 1/c) 0 ) = 1/c f(cx) + f(0) - 1/c f(0), so cf(x) = f(cx) + cf(0) - f(0), so f(cx) - f(0) = c(f(x) - f(0)).

Now, for additivity: 1/2 f(x+y) = f(1/2 (x+y)) = 1/2 f(x) + 1/2 f(y) so f(x+y) = f(x) + f(y)

Bonus: Hyperplane separation for finite dimensions:

First, lemma: there is a unique closest point x in S to v (v not in S).

First, notice that the function f(x) = d(x,v) is continuous, is always greater than 0, and there's always some ball of radius R around the origin so that outside that ball, the distance will always be greater than some value and blow up as we go to infinity (just choose R a bit larger than |v|). Now, the only way for f to not have a minimum on S is if there's an infinite sequence of values in S with decreasing f values, and since S is closed this sequence cannot be contained in any bounded set (in finite dimensions!) - but then it goes to infinity, when the distance was supposed to eventually be blowing up. Therefore there's a minimum.

Now, since S is convex, if there are two minimum distance points we can draw an isosceles triangle whose base goes between the minima, and then take the midpoint. The midpoint must be closer, a contradiction. (the vector algebra version of this geometry: shift so that the midpoint is 0, so that then the vectors to the two vertexes of the base are x and -x; and call the other vertex v. Now |v|^2 + |x|^2 + 2 <v,x> = |v+x|^2 = |v-x|^2 = |v|^2 + |x|^2 - 2 <v,x>, so <v,x> = 0, but then |v-x|^2 = |v|^2 + |x|^2 so |v|^2 is smaller than |v-x|^2).

Now, take the closest point x in S to v. We'd like to make our hyperplane between S and v, so hopefully S is in the 'other direction' of v.

Let's shift everything so that x is the 0 vector of our vector space. Now, let's take any w in S, and consider the projection v' = 1/|w|^2 <w,v> w. Note that |v'|^2 = |<w,v>|^2/|w|^2, and that if |<w,v>|/|w|^2 <= 1, then v' must be in S by convexity (as it's a mixture of 0 with w).

Assume that's the case - but then |v|^2 < |v - v'|^2 = |v|^2 + |v'|^2 - 2 <v, v'> so |v'|^2 > 2 <v, v'>, thus |<w, v>|^2 > 2 |<v, w>|^2, which is a contradiction.

Therefore, |<w,v>| > |w|^2. But then 0 < |v-w|^2 - |v|^2 = |w|^2 - 2<v,w> < |<w,v>| - 2 <v,w>. So if <v,w> is positive then 0 < (a negative number). Therefore <v,w> is negative.

But then, we are basically done! Just define f(w) = <w,v>/|v|^2. In the original coordinates, f(w) = <w-x, v-x>/|v - x|^2. Clearly f(v) = 1, and we just proved that it's negative for w in S. Tada, there's the linear functional we need.

Problem 2:

First, the set of points in the plane below f is closed (we know it's convex already since f is concave). This is because we can take any sequence with a limit point, and then plug the limiting x value into f to get the limiting value of the y coordinates (by continuity), which would then mean that the limit point is in the set.

Take any point x, and pick some epsilon. Now look at the vector v in the plane that's epsilon above the point (x,f(x)). Get a separating hyperplane through v (since f is concave, the set below it is convex). As long as the hyperplane isn't vertical, it will just be some affine line, for which there is an affine function whose graph is equal to it. Of course, it can't be vertical, because that would require a discontinuous function.

This is then always above the graph of f, and so the affine function dominates f. It must be equal to f(x) + epsilon at x, and so therefore the infimum of the affine functions at this point must be f(x).

Problem 3:

If f is 0 increasing, then any hyperplane separating the set of points below the graph of f will be above (0,f(0)), and so will be given by an affine line that is also 0-increasing. This affine line must be monotone up if f is, for otherwise the affine line would have negative slope - but then we could pick any x, and then go to the right until we reach a point x' such that the affine line is lower than f(x). But f(x) < f(x'), and so the affine line is lower than f(x'), a contradiction. Thus, any separating line has positive slope.

Now, note that for x>0 we have |f(0) - f(-x)| = f(0) - f(-x) <= |0- -x| = x, so that f(-x) >= f(0) - x. However, i the line is not 1-Lipschitz (which for lines just means that the line has slope of size bigger than 1), then the line is given by l(-x) = c - mx where c >= f(0), m>1. But then these two will cross over at c - f(0) = (m-1) x, that is, at x = (c - f(0))/(m - 1) > 0. Thus there's some point to the left of the origin where l(-x) < c - mx = f(0) - x <= f(-x), but then the affine line wouldn't dominate f.

Therefore all the dominating affine lines satisfy the conditions, so the infimum is f by the previous problem.

Problem 3b:

f clearly must be monotone, because every term in the infimum at the point x is smaller than the corresponding term in the infimum at the point x' since the lines are monotone.

It must be 0-increasing, since each term in the infimum at 0 is >= 0, and so the infimum is too.

1-Lipschitz: Take two points a, b. For all eps > 0 we have that there's an affine dominating line l(x) = c + mx (with 0<= m <= 1) such that l(a) - f(a) <= eps. Thus f(b) - f(a) = f(b) - l(a) + l(a) - f(a) <= l(b) - l(a) + l(a) -f(a) <= m (b-a) + eps so (f(b) - f(a))/(b-a) <= m + eps/(b-a) <= 1 + eps/(b-a). Note that except for the single epsilon, nothing else in that inequality depends on epsilon or the line. Thus we can make eps/(b-a) as small as we like, so that (f(b) - f(a))/(b-a) <= 1 + eps' for all eps' > 0 - but this is only possible if (f(b) - f(a))/(b-a) <= 1. Thus f is 1-Lipschitz

Concave: Suppose we had a, b, and then let c be the mixture c = (1-t)a + tb for 0<=t<=1. Then for all eps > 0 there's some dominating affine line l(x) = l(0) + mx such that l(c) - f(c) <= eps, and thus so l(0) + mc <= f(c) + eps. Then we have that l(0) + ma >= f(a) and l(0) + mb >= f(b), but then if we take a mixture we'd get (1-t) f(a) + tf(b) <= l(0) + mc <= f(c) + eps.

But then (1-t) f(a) + tf(b) <= f(c) + eps for all eps > 0, so (1-t) f(a) + tf(b) <= f(c).

Since this works for all mixtures c, we have that f is concave

Continuous: Every Lipschitz function is continuous.

Problem 4a,b: I am only doing part of 4a, because I do not believe I will learn much beyond the previous problems, and so am superceding the instructions.

Every affine dominating line l of a concave monotone 1-Lipschitz and 0-increasing functional p must be:

0: 0-increasing, of course

1: monotone, because otherwise we'd have a f,g such that f <= g, l(f) > l(g), and l(f) >= p(f). WLOG assume l is linear. But then if we walk (l(f) - l(g)) (l(f) - p(f)) further along the line through f and g we would then cross the functional - that is, let's look (a little further to get lower) at f + 2(l(f) - p(f))/(l(f)-l(g)) (g-f). If we apply l to this we get l(f) + 2p(f) - 2l(f) = 2p(f) - l(f) < 0. But since (l(f) - p(f))/(l(f) - l(g)) is positive, and g-f >= 0, we have that the vector we're adding to f is >= 0, so after adding f we'll get something that's >= f. But then the value of p should be higher, and is still supposed to be below l, a contradiction.

Problem 5a:

For any point, evaluate at that point. Or try evaluating at each of any set of n points, and then adding the results together.

Note: I don't like the way you've explained the definition of the expectation. That's definitely a weighted average, with weights given by dm. If you wanted to elaborate on the difference, you should've just said that you basically have dm play the role of f(y) dy - a calculus student can understand that, as they have seen df = f dx.

Problem 5b:

It's obviously linear. For continuity, suppose f,g in C(X) differ by at most delta in the sup norm. Then |\int f dm - \int g dm| = |\int (f - g) dm| = |\int (f - g) dm_+ - \int (f - g) dm_-| <= \int |f - g| dm_+ + \int |f - g| dm_- <= \delta * (m_+(X) + m_-(X)), and m_+(X) and m_-(X) must be finite, so we can take delta small enough to make this less than any arbitrary epsilon.

Problem 5c, the second one (you presumably made a typo and meant to call it 5d): I assume you meant to say "What extra conditions... must be added to get it to correspond to a (positive) measure, not just a signed measure?"

Positive measure: We want the \int f dm >= 0 for all f that are nonnegative. Therefore if g >= f, \int (g - f) dm >= 0, so \int g dm >= \int f dm, and thus we must have a monotone functional. Likewise if we have a monotone functional, then this will be satisfied.

Probability measure: We want the \int 1_X dm = 1. Suppose f, g differ by at most 1 in the sup norm. Then \int |f - g| dm <= \int 1 dm <= 1. But then we can just show that the functional is 1-lipschitz: \int |f - g dm <= sup |f - g| \int 1 dm = sup |f -g|.

However, we can have 1-lipschitz functionals that aren't probability distributions - for example, just assign 0 to everything! However, as long as any set has finite measure, we'll know that a 1-lipschitz functional will give 0 < m(X) <= 1, and then we can rescale to get a probability measure.

Problem 6:

Affine means we can split it into a linear part and a constant part. 0-increasing means the constant part is >= 0. Monotone means we have a (nonnegative) measure, and thus we have an a-measure.

Problem 7: I indeed see that this is just like the previous problems, and so won't rewrite it.

Finale: Oh hey, worst case selection! That's neat.

I have two questions that may be slightly off-topic and a minor remark:

Is a list of open and tractable problems related to Infra-Bayesianism somewhere available?
Do you plan to publish the results of the Infra-Bayesianism series in a peer-reviewed journal? I understand that there are certain downsides; mostly that it requires a lot of work, that the whole series may be too long for a journal article and that the peer review process takes much time. However, if your work is citeable, it could attract more researchers, who are able to contribute.
On page 22, you should include the condition a(bv) = (ab)v into the definition of a vector space.

2.3: I claim that the definition "m is the infimum of S := ∀x:x≤m⇔x≤S" is better: It should make for prettier proofs, and expresses "m eliminates ∀s∈S for purposes of testing ≤".

Have this marvelously pretty proof of 3b:
Theorem: {monotone, 1-Lipschitz, 0-increasing, concave} $: p Φ ⟹ p inf Φ$
Proof:

Open: Does 3a characterize affine? Can the full set of p for which this goes through be succinctly described?

5c:

I had rejected this answer as unfair. This sort of trolling should be overseen personally.

However, as you work with function spaces, convexity in a function space is different then being a convex function, so maybe some new notation should be introduced.

pg 6 "there exist" -> "there exists"

pg 13 maybe specify that you mean a linear functional that cannot be written as an integral (I quickly jumped ahead after thinking of one where you don't need to take any integrals to evaluate it)

I've posted complete, Moore-method-style solutions over on my blog. Spoilers, natch. Please nitpick them to death.

Video link in the pdf doesn't work

Hi Max! That link will start working by the end of this weekend. Connall and I are putting the final touches on the video series, and once we're done the link will go live.

64

Infra-Exercises, Part 1

64

Ω 17

64

Ω 17

64

Ω 17