Just some feedback: I'm probably about average in math skill here (or maybe below average, the most math I've done is calculus 10 years ago) and with some work I'm able to get through some of this. When I first looked at it I didn't understand anything but reading the wikipedia on VNM utility theorem and the always helpful List of Mathematical Symbols I was able to get through most of Lemma 1. I was able to prove it to my satisfaction using the solver in Excel and can follow most of the proof up until "Thus, the result follows", I don't see how it follows.
Are there any recommendations for slowly improving math skills other than just trying to work through things like this when time permits? Are people willing to host a Google Hangout where they walk through things such as this for those of us who are curious but have difficulty working it out all on our own (I know I probably could work it all out given enough time, but its hard to be motivated enough to make the time. When I first found the site, I didn't know about Bayes theorem or any of the probability theory notation, but I saw its importance and so made sure to spend the time so I can follow it and work it out on my own when needed).
I think it's a general problem in the way mathematics is taught (at least around here in Finland and I'm basing this on considerably low amount of empirical observations) that the language of mathematics is not very well elaborated: What each symbol stands for, what's the logical rule set for using each symbol, like for an example if you have the symbol for sigma to stand for summation - and so even if the students could use their math skills in principle they end up stumbling in practice due to not know how to interpet some statement using symbols they're not entirely familiar with. Another similar problem in my opinion is the lack of emphasis on understanding what's actually happening on the abstract level. Why does this work? How do you exploit this rule to arrive at a truthful and more revealing answer?
I'm not sure if this is any good but here's how I personally like to go about learning math - which I've not really done much.
Try and understand what's going on in the abstract level. Involves questions like: Why does this work? What's the rule? What's the exception? Is there a fixed relation?
Understanding the computing part, the operation. What do you actually do to achieve the wanted outcome. Do you add numbers? How do you derive a function for an example?
Understanding the language of mathematics related to the concept. What are the symbols involved? For an example involved with functions, derivals, integrals? What are the rules used in the language in particular? (For an example if you have the symbol of Sigma and below it are i=n what do the symbols below it stand for? What's the rule with the symbol?)
Doing a full operation using the so far obtained knowledge to perform a computation. So you start with some kind of data and end up into a final position where the data has been transformed or simplified with the help of the mathematics. ( If at this point you still don't know what's going on, try to think backwards to 1. )
Application of the developed skill At this point you understand the mathskill in question well enough to have attached sort of "handles" to it. So you can understand the concept well enough to recognize it in a different environment and use the ability to handle a subset of data from a larger sample. To make an example if you understand trigonometry very well then handling sectors and related angles with circles is much easier. Both 1. and 5. involve sufficient understanding to be able to predict what kind of change has been caused on the initial position to some extent after the calculation or the operation.
If you dont understand 1. well enough then you just jump to 2. and 3. which go mostly hand in hand, to actually execute the operation you usually need to know both the language and the operation, but not necessarily. Then once you arrive at point 4. you can try to reason backwards from the answer and get the idea at 1. The abstract information is crucial to be able to effectively apply the information when needed like described in part 5. So: (Skip if must) Step 1 - Step 2&3 - Step 4. - (back to) Step 1. or Step 5.
I also think very brilliant students obtain the contents of these subtopics automatically when studying mathematics, but at least personally I'm notne of them and I think an analytical method like this comes in handy.
What does someone else think about this? In particular someone who already knows lots of math, does this make sense?
This is a mathematical appendix to my post "Why you must maximize expected utility", giving precise statements and proofs of some results about von Neumann-Morgenstern utility theory without the Axiom of Continuity. I wish I had the time to make this post more easily readable, giving more intuition; the ideas are rather straight-forward and I hope they won't get lost in the line noise!
The work here is my own (though closely based on the standard proof of the VNM theorem), but I don't expect the results to be new.
*
I represent preference relations as total preorders  on a simplex 
; define 
, 
, 
 and 
 in the obvious ways (e.g., 
 iff both 
 and 
, and 
 iff 
 but not 
). Write 
 for the 
'th unit vector in 
.
In the following, I will always assume that  satisfies the independence axiom: that is, for all 
 and 
, we have 
 if and only if 
. Note that the analogous statement with weak preferences follows from this: 
 holds iff 
, which by independence is equivalent to 
, which is just 
.
Lemma 1 (more of a good thing is always better). If  and 
, then 
.
Proof. Let . Then, 
 and 
. Thus, the result follows from independence applied to 
, 
, 
, and 
.
Lemma 2. If  and 
, then there is a unique 
 such that 
 for 
 and 
 for 
.
Proof. Let  be the supremum of all 
 such that 
 (note that by assumption, this condition holds for 
). Suppose that 
. Then there is an 
 such that 
. By Lemma 1, we have 
, and the first assertion follows.
Suppose now that . Then by definition of 
, we do not have 
, which means that we have 
, which was the second assertion.
Finally, uniqueness is obvious, because if both  and 
 satisfied the condition, we would have 
.
Definition 3.  is much better than 
, notation 
 or 
, if there are neighbourhoods 
 of 
 and 
 of 
 (in the relative topology of 
) such that we have 
 for all 
 and 
. (In other words, the graph of 
 is the interior of the graph of 
.) Write 
 or 
 when 
 (
 is not much better than 
), and 
 (
 is about as good as 
) when both 
 and 
.
Theorem 4 (existence of a utility function). There is a  such that for all 
,
Unless  for all 
 and 
, there are 
 such that 
.
Proof. Let  be a worst and 
 a best outcome, i.e. let 
 be such that 
 for all 
. If 
, then 
 for all 
, and by repeated applications of independence we get 
 for all 
, and therefore 
 again for all 
, and we can simply choose 
.
Thus, suppose that . In this case, let 
 be such that for every 
, 
 equals the unique 
 provided by Lemma 2 applied to 
 and 
. Because of Lemma 1, 
. Let 
.
We first show that  implies 
. For every 
, we either have 
, in which case by Lemma 2 we have 
 for arbitrarily small 
, or we have 
, in which case we set 
 and find 
. Set 
. Now, by independence applied 
 times, we have 
; analogously, we obtain 
 for arbitrarily small 
. Thus, using 
 and Lemma 1, 
 and therefore 
 as claimed. Now note that if 
, then this continues to hold for 
 and 
 in a sufficiently small neighbourhood of 
 and 
, and therefore we have 
.
Now suppose that . Since we have 
 and 
, we can find points 
 and 
 arbitrarily close to 
 and 
 such that the inequality becomes strict (either the left-hand side is smaller than one and we can increase it, or the right-hand side is greater than zero and we can decrease it, or else the inequality is already strict). Then, 
 by the preceding paragraph. But this implies that 
, which completes the proof.
Corollary 5.  is a preference relation (i.e., a total preorder) that satisfies independence and the von Neumann-Morgenstern continuity axiom.
Proof. It is well-known (and straightforward to check) that this follows from the assertion of the theorem.
Corollary 6.  is unique up to affine transformations.
Proof. Since  is a VNM utility function for 
, this follows from the analogous result for that case.
Corollary 7. Unless  for all 
, for all 
 the set 
 has lower dimension than 
 (i.e., it is the intersection of 
 with a lower-dimensional subspace of 
).
Proof. First, note that the assumption implies that . Let 
 be given by 
, 
, and note that 
 is the intersection of the hyperplane 
 with the closed positive orthant 
. By the theorem, 
 is not parallel to 
, so the hyperplane 
 is not parallel to 
. It follows that 
 has dimension 
, and therefore 
 can have at most this dimension. (It can have smaller dimension or be the empty set if 
 only touches or lies entirely outside the positive orthant.)