# 11

Not too long ago, I asked LessWrong which math topics to learn. Eventually, I want to ask for what the prerequisites for each of those topics are and how I should go about learning them. This is a special case of that.

I'm rereading the sequences and Eliezer seems to love E.T. Jaynes. As part of my rationality self-study, I want to work my way through his Probability Theory: the Logic of Science. What math topics do I already need to understand to prepare myself for this? I learned calculus once upon a time, but not fantastically well, and I plan to start by reviewing that.

Also,

Despite Eliezer's praise of the "thousand-year-old vampire", it there a better book to learn probability theory?

Does anyone want to learn this (or the other math from my post above) with me? I'd love to have a partner or maybe even a work group. Location is no obstacle. [Two caveats: 1. I'm busy with stuff and may not be able to get into this for a few months 2. I hard, but I am incredibly slow at computation (such that on every math test I have ever taken, it took me at least 3 times as long as the second slowest person in the class to finish). You might find that I go to slow for you.]

New Comment

Only tangentially related, sorry.

I am incredibly slow at computation (such that on every math test I have ever taken, it took me at least 3 times as long as the second slowest person in the class to finish).

I have tutored a fair amount, and my experience is that when someone says "I am incredibly slow at computation", the real issue is their lack of fluency. After you simplified your first thousand of trig expressions, when you look at the expression 1001, you already see most of the answer, all you have to do is write out the steps.

My model of a learner is that a person is a sort of Markov chain, where you try to go from the NO-SKILL state through the LEARNING state into the HAVE-SKILL state, also known as fluency or mastery.

Learning goes like this:

NO SKILL ----- learning rate ----> LEARNING- ---- internalization rate ----> HAVE SKILL

Forgetting goes like this:

HAVE SKILL ---- slow forgetting rate ---> NO SKILL <---- fast forgetting rate ---- LEARNING

People are extremely different in their learning and forgetting rates, which are also individually subject-dependent. Some learn a new skill quickly, others take awhile. Some retain 90% of the skill from one session to the next, others barely 1%. The internalization rate is less variable. As long as you manage to mostly keep in the LEARNING state for some period of time, you eventually get to the HAVE SKILL state.

The slow forgetting rate is well,,, slow for almost everyone, so it takes a long time to forget a well-mastered activity. You can probably do long division still (maybe after 5-10 min ramp-up), even if you haven't done any in a decade and it was a real pain to learn the first time.

The apparent learning and forgetting rates also depend on already having the skills similar to the one you are learning, like when building the jig-saw puzzle.

Anyway, my point is that you have likely misdiagnosed yourself. The symptom "incredibly slow at computation" could be a manifestation of one or more of the following:

• being in the LEARNING state instead of the HAVE SKILL state
• having relatively low learning rate and/or fast forgetting rate for the usual amount/frequency of repetition
• trying to learn the skill in isolation, which slows down learning significantly

If you figure out which of these apply to you, odds are you will no longer consider yourself "slow at computation", but, say, "requiring more frequent repetitions than average to master a new computational skill".

This model is, of course, rather simplified, as everyone appears to have their limits which they hit eventually for an advanced enough skill, but hopefully helpful enough for the initial diagnosis.

I think it's also useful to clarify the relevant meaning of "fluency" in a technical topic, so that we can talk about fluency in smaller topics and work the ratchet of getting more and more stuff towards "have skill", without juggling too much at once in the "learning" state and forgetting things before they are fixed.

The relevant sense of fluency is not about speed or quality of results or diffculty of the problems that can be solved, even though these things come with fluency, but about skill at answering most simple questions and performing recurrent tasks that's mostly offloaded to System 1, that's intuitive and doesn't require too much attention to keep going. Solving simple problems has to become easy. Even if you can solve hard problems perfectly and quickly using a method novel to you that was just explained, that's not yet fluency, because you'd be leaning on attention and working memory. If you commit those same capabilities as System 1 skills, they would do most of the work for you while leaving enough attention to plan the process, and won't be forgotten in a year, allowing accumulation of vast expertise. Compare this practice with learning a topic just well enough to become capable of solving some problems (cramming for an exam is an important example of this failure mode).

Things one can become fluent at can be as small as seeing the structure and motivation for a solution to a particular (kind of) exercise or to a standard lemma. As skills add up, knowledge of how particular words translate becomes ability to speak a new language. This never happens if you keep relying on a combination of working memory and a dictionary.

Right, I agree. Being able to answer simple questions almost instantly, without even having to think through it consciously, is a good indication of the System 1 at work, which is the goal (or maybe the definition) of learning a skill. Some programs and tests are deliberately structured in this way. The Kumon program is one.

To get a more introductory -- but still quite thorough, and more modern -- Bayesian perspective, I recommend John Kruschke's Doing Bayesian Data Analysis. Ignore the silly cover. The book is engagingly written and informative. As a side benefit, it will also teach you R, a very useful language for statistical computing. Definitely worth learning if you are at all interested in data analysis.

Also, you should learn some classical statistics before getting into Bayesian statistics. Jaynes won't really help with that. Kruschke will help a little, but not much. The freely available OpenIntro Statistics textbook is a very good introduction.

I recommend first reading OpenIntro, then Kruschke, then Jaynes.

Also, you should learn some classical statistics before getting into Bayesian statistics.

I'm of two minds about that. I did classical first, and found it painful. It was just wrong. Recipes without real justification. Jaynes was such a relief after that. He just made sense, step after step.

So I would have wished to have started with Jaynes.

But maybe it's good to learn the horrible way first, so that you really appreciate the right way?

Nah, that seems rather demented. Learn the right way first. Learn Jaynes. He covers the basic classical statistical methods anyway, and in a better fashion than classical statistics classes do. He just makes more sense.

Recipes without real justification.

This sounds more like a pedagogical issue than an inherent problem with classical statistics. I agree that Bayesianism is philosophically "cleaner", but classical statistics isn't just a hodge-podge of unjustified tools. If you're interested in a sophisticated justification of classical methods, this is a good place to start. I'm pretty sure you'll be unconvinced, but it should at least give you some idea of where frequentists are coming from.

Please do not make statements like

"Recipes without real justification. Jaynes was such a relief after that. He just made sense, step after step."

I am not a "classical statistician", but Harald Cramer's 'http://www.amazon.com/Mathematical-Methods-Statistics-Harald-Cram%C3%A9r/dp/0691005478' is still incredibly relevant. He is also famous for relevant results in insurance mathematics and risk theory. It wouldn't be too much of a understatement to say he is the father of modern ruin theory. Something that should otherwise be relevant to all people who care about tail risk.

Do you mean classical, as in the classic frequency of Cramers? Cramers view is still essential. What about logical frequency views such as Kyburg's? Is that 'classical'? Is the difference between the logical approach of Jaynes and the Logical Frequentist approach of Kyburg's closer than Jaynes vs other Bayesians?

Jaynes is a top tier book, but it is false to say that it covers classical statistics better than Harald Cramer's.

[-]gjm10

I don't think buybuydandavis was saying that Jaynes covers classical statistics well, but that classical statistics isn't worth covering well and that Jaynes covers more useful things well.

For an introductory course on statistics (which uses the OpenIntro Statistics textbook), I strongly recommend Coursera's Data Analysis and Statistical Inference. Before I found this course, I tried Coursera's Statistics One and Udacity's Intro to Statistics, neither of which I recommend.

I agree with the Kruschke recommendation. I bought a copy of Doing Bayesian Data Analysis a couple of weeks ago and am working my way through it now. It is quite good. You'll need an understanding of undergraduate-level calculus and some background in basic probability to understand it, I think.

I think the first three chapters of Jaynes are really excellent, and the chapters on updating (4 and 6 I think?), symmetry properties, and maximum entropy methods are also highly useful.

You will need some calculus (On the advanced end, you might need to understand how to use Lagrange multipliers, but it's not strictly necessary), but I think that's it. If you are unfamiliar with other concepts like Shannon entropy, what it might require is more practice of the material.

If you want to actually use this for data analysis, another modern textbook would be helpful (don't be afraid to read multiple books on the same thing).

A while back, I tried reading Jaynes carefully (i.e. working lots of derivations while reading). I'll share my thoughts, but since I stopped after two and a half chapters, YMMV if you read further.

(1) I felt like I was reading a physics textbook. I'm a recovering physics major, and the experience gave me a serious case of Griffiths deja vu. For example, Jaynes does things like:

• Play fast and loose with Taylor series expansions
• Give arguments based on intuition and/or symmetry
• Assume all functions are well-behaved
• Use concepts / notation from calculus that I've completely forgotten (or never learned)

If you've taken university level physics before, then you should feel somewhat at home reading the first few chapters of Jaynes. If not, I would recommend putting in a bit of extra effort to make sure that you understand EVERY step of important arguments / derivations.

(2) After several days of effort, I got to the point where... you could show that if an urn has 3 red balls and 7 black balls, then the probability of drawing a red ball is 3/10. Yay!

Ok, fine, to put it another way, by making a few VERY basic assumptions about reasoning under uncertainty, you can show that the laws of probability are uniquely determined.

If you think this is the coolest revelation ever, then you should definitely read Jaynes. On the other hand, If you'd rather learn how to win at poker, or analyze randomized algorithms, or do calculations about 3d random walks in a cylinder, or something, then Jaynes is probably not the right textbook for you at this time.

What math topics do I already need to understand to prepare myself for this?

Calculus, how to Taylor expand, how to carefully and patiently follow a long argument. I would recommend against going down a deep rabbit hole though (e.g. I would discourage trying to learn "all of multivariable calculus" before starting Jaynes).

Is there a better book to learn probability theory?

It depends; probability theory is a huge topic, and you can attack it from many different angles depending on your goals and interests (e.g. where you want to apply it, whether you're learning it as a prerequisite for another topic). That said, if what you're after is LessWrong / CFAR street cred, then I would probably stick with Jaynes ;-)

Here are some alternatives:

Thanks!

I agree that the first few chapters of Jaynes are illuminating, haven't tried to read further. Bayesian Data Analysis by Gelman feels much more practical at least for what I personally need (a reference book for statistical techniques).

The general pre-requisites are actually spelled out in the introduction of Jayne's Probability Theory. Emphasis mine.

The following material is addressed to readers who are already familiar with applied mathematics at the advanced undergraduate level or preferably higher; and with some field, such as physics, chemistry, biology, geology, medicine, economics, sociology, engineering, operations research, etc., where inference is needed. A previous acquaintance with probability and statistics is not necessary; indeed, a certain amount of innocence in this area may be desirable, because there will be less to unlearn.

familiar with applied mathematics at the advanced undergraduate level or preferably higher

I don't know what that means. Calculus? Analysis? Linear algebra? Matrices? Non-euclidean geometry?

Basic course on Stochastics and Probability theory via Kolmogorov's approach, with perhaps some real analysis. If you want to keep reading upper level books then there's no reason to stop at Jaynes, there's differential geometry as applied in information geometry.

familiar with applied mathematics at the advanced undergraduate level or preferably higher

In working through the text, I have found that my undergraduate engineering degree and mathematics minor would not have been sufficient to understand the details of Jaynes' arguments, following the derivations and solving the problems. I took some graduate courses in math and statistics, and more importantly I've picked up a smattering of many fields of math after my formal education, and these plus Google have sufficed.

Be advised that there are errors (typographical, mathematical, rhetorical) in the text that can be confusing if you try to follow Jaynes' arguments exactly. Furthermore, it is most definitely written in a blustering manner (to bully his colleagues and others who learned frequentist statistics) rather than in an educational manner (to teach someone statistics for the first time). So if you want to use the text to learn the subject matter, I strongly recommend you take the denser parts slowly and invent problems based on them for yourself to solve.

I find it impossible not to constantly sense in Jaynes' tone, and especially in his many digressions propounding his philosophies of various things, the same cantankerous old-man attitude that I encounter most often in cranks. The difference is that Jaynes is not a crackpot; whether by wisdom or luck, the subject matter that became his cranky obsession is exquisitely useful for remaining sane.

Good quote.

But I would have bolded

already familiar with applied mathematics ... where inference is needed

That's where Jaynes shines. Many mathematical subjects are treated axiomatically. Jaynes instead starts from the basic problem of representing uncertainty. Churning out the implications of axioms is a very different mindset than "I have data, what can I conclude from it?"

I think this is true as well.

A previous acquaintance with probability and statistics is not necessary; indeed, a certain amount of innocence in this area may be desirable, because there will be less to unlearn.

On your buildup to PTTLOS, you could start with some of Jaynes' papers just prior to writing the book that will give you his thoughts and motivations at the time

http://bayes.wustl.edu/etj/node1.html

These two look good for providing context:
PROBABILITY THEORY AS LOGIC
A BACKWARD LOOK TO THE FUTURE - Skip to the Probability Theory sections, and read to the end.

I'm not sure what your background is; there are a number of books in philosophy of statistics that might be more accessible than Jaynes. I'm with the others - you should study some of the simpler foundations of statistics before diving into Bayesianism.

1) Luce and Raiffa's Games and Decisions (1989) introduces von Neumann and Morgenstern's axioms of objective utility theory and game theory; the book is geared towards serious social scientists.

2) Savage's The Foundations of Statistics (1954) is much harder than Luce and Raiffa, but it introduces an axiomatic system of subjective utility theory that complements objective utility theory well.

3) Fagin, Halpern, Moses and Vardi's Reasoning About Knowledge (2004) contains a reasonable introduction to statistics via the Kolmogorov axiomatization. It is suitable for technically minded philosophy students who have no background in measure theory, set theory or analysis.

4) Modern statistics builds heavily on calculus. You may want to brush up - I really like Bressoud's A Radical Approach to Real Analysis (2006). If you find yourself into the Kolmogorov axiomatization, you should also check out Bressoud's A Radical Approach to Lebesgue Measure Theory (2008).

These books should give you some perspective on the different traditions in philosophy of statistics and rational decision theory. After this Jaynes should make a lot of sense - he belongs to the "subjective" camp that Savage pioneered. However, it is important to understand when studying philosophy of statistics that there are multiple camps and they have different perspectives.

Despite other suggestions, I encourage you to read the whole book, not just the first chapters. It will give you a very solid understanding of theoretical probability theory, on things like "why the normal", group symmetry priors, nonconglomerability, A_p distributions, etc.

On the other hand, when you have read the first 6 chapters of Jaynes, go look at the first part of Sivia, you'll see Jaynes' concepts laid out with more simpler examples and a high degree of clarity.

The second part of Sivia and the Gelman/Carlin/Stern/Rubin could be used to gain a deeper understanding of what modern probabilistic models actually are and how they are used in practice.
Be aware though, after Jaynes, tolerating the sloppy thinking that is endemic in the field would be much harder.

There is a Further Reading section of the Wikipedia page on Bayesian Inference I am working through. PTTLOS is under the "Advanced" subsection, but there are plenty of titles in the "Elementary" subsection that probably constitute a good buildup.