I've been wondering how useful it is for the typical academically strong high schooler to learn math deeply. Here by "learn deeply" I mean "understanding the concepts and their interrelations" as opposed to learning narrow technical procedures exclusively.

My experience learning math deeply

When I started high school, I wasn't interested in math and I wasn't good at my math coursework. I even got a D in high school geometry, and had to repeat a semester of math.

I subsequently became interested in chemistry, and I thought that I might become a chemist, and so figured that I should learn math better. During my junior year of high school, I supplemented the classes that I was taking by studying calculus on my own, and auditing a course on analytic geometry. I also took physics concurrently.

Through my studies, I started seeing the same concepts over and over again in different contexts, and I became versatile with them, capable of fluently applying them in conjunction with one another. This awakened a new sense of awareness in me, of the type that Bill Thurston described in his essay Mathematics Education

Mathematics is like a flight of fancy, but one in which the fanciful turns out to be real and to have been present all along. Doing mathematics has the feel of fanciful invention, but it is really a process of sharpening our perception so that we discover patterns that are everywhere around.

I understood the physical world, the human world, and myself in a way that I had never before. Reality seemed full of limitless possibilities. Those months were the happiest of my life to date.

More prosaically, my academic performance improved a lot, and I found it much easier to understand technical content (physics, economics, statistics etc.) ever after.

So in my own case, learning math deeply had very high returns.

How generalizable is this?

I have an intuition that many other people would benefit a great deal from learning math deeply, but I know that I'm unusual, and I'm aware of the human tendency to implicitly assume that others are similar to us. So I would like to test my beliefs by soliciting feedback from others.

Some ways in which learning math deeply can help are:

  • Reduced need for memorization (while learning math). When you understand math deeply, you see how many different mathematical problems are special cases of a single more general problem, so that in order to remember how to do all of the problems, it suffices to remember the solution to that more general problem. This reduces the cognitive load of doing math relative to what it would be if one was considering each individual problem in isolation. When I taught calculus to freshmen at University of Illinois, I got the impression that many of the students studied for tests by trying to memorize all of the homework problems individually. There were too many homework problems to memorize, so this didn't work very well. Had they learned the material on a deep level, they wouldn't have had this problem.
  • Ability to apply knowledge in novel contexts (that require mathematical reasoning). When you understand general mathematical principles, you can apply mathematical knowledge to tackle mathematical problems that you've never seen before. This contrasts with mathematical knowledge that's restricted to knowledge of how to solve specified problems.
  • Higher retention of (mathematical) material. Cognitive psychologists have found that students retain information better when they engage in "deep level processing" rather than "shallow level processing" (see the notes on Video 2 of Stephen Chew's "How to Get the Most Out of Studying" video series). Developing deep understanding of math reduces need to review mathematical material when one needs to know it for future units and courses (whether within math or adjacent to math). This cuts down on the amount of study time necessary to master later material. 
  • Developing better general reasoning skills (across domains). Learning math deeply is closely connected with developing mathematical reasoning skills. Distilling general principles from special cases involves abstract reasoning. In the other direction, when you understand general principles, it makes mathematical reasoning feel a lot less cumbersome, which incentivizes one to do more of it (relative to the counterfactual). Mathematical reasoning ability may be transferable to reasoning ability in other contexts, so that learning math deeply builds general reasoning skills.

Some arguments against learning math deeply being useful are:

  • It may be too hard. Sometimes when I suggest that learning math deeply is helpful, people respond by saying that most people aren't capable of learning abstract concepts with enough ease so that it makes sense for them to try to learn math deeply rather than just memorizing how to do specific problems. This is an ill-defined claim, but it can be made precise by specifying a population and a given level of mathematical abstraction. 
  • The span of the payoff may be too short. For people who won't go on to take many math courses, the benefits of reduced future study time and higher retention might not be worth the upfront investment of learning math deeply.
  • Mathematical reasoning may not be very transferable. A counterpoint to the "developing better reasoning skills" point above: it's known that transfer of learning from one domain to another is often very low. So learning mathematical reasoning skills may not be an efficient way of developing reasoning skills that can be used in the context of one's career or personal life.

I'd be grateful to anyone who's able to expand on these three considerations, or who offers additional considerations against the utility of learning math deeply. I would also be interested in any anecdotal evidence about benefits (or lack thereof) that readers have received from learning math deeply.

New to LessWrong?

New Comment
79 comments, sorted by Click to highlight new comments since: Today at 4:25 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Reason to learn math deeply: by forcing you to master alternating quantifiers, it expands your ability to understand and handle complex arguments.

This falls, possibly, under your "developing better general reasoning skills", but I would stress it separately, because I think it's an especially transferrable skill that you get from learning rigorous math. Humans find chains of alternating quantifiers (statements like "for every x, there exists y, such that for every z...") very difficult to process. Even at length 2, people without training often confuse the meanings of forall-exist and exist-forall. To get anywhere in rigorous math, a student needs to confidently handle chains of length 4-5 without confusion or undue mental strain. This is drilled into the student during the first 1-2 years of undergraduate rigorous math, starting most notably with the epsilon-delta formalism in analysis. The reason this formalism is notoriously difficult for many students to master is precisely that it trains and drills larger chains of quantifiers than the students have hithertoo been exposed to.

(other math-y subjects have their own analogues; for example, I think the chief rea... (read more)

I do agree with the part about the quantifiers. This is, at least in theory, one of the reasons that we are supposed to teach the epsilon-delta definition of limit in college calculus courses. I generally try to frame it as a game between the prover and the skeptic, see for instance the description here. One of the main difficulties that students have with the definition is staying clear of whose strategic interest lies in what, for instance, who should be the one picking the epsilon, and who should be the one picking the delta (the misconceptions on the same page highlight common mistakes that students make in this regard). Incidentally, this closely connects with the idea of steelmanning: in a limit proof or other mathematical proof showing that a definition involving quantifiers is satisfied, one needs to demonstrate that for all the moves of one's opponent, one has a winning strategy to respond to the best move the opponent could possibly make. The first time I taught epsilon-delta definition in a (non-honors) calculus class at the University of Chicago, even though I did use the game setup, almost nobody understood it. I've had considerably more success in future years, and it seems like students get something like 30-50% of the underlying logic on average (I'm judging based on their performance on hard conceptual multiple choice questions based on the definition).
Couldn't you develop the same skill more efficiently by just studying formal logic?
Probably? But the number of people who study formal logic to the required degree is dwarfed by the number of people who need this skill. Also, mathematical logic, studied properly, is hard. It forces you to conceptualize a clean-cut break between syntax and semantics, and then to learn to handle them separately and jointly. That's a skill many mathematicians don't have (to be fair, not because they couldn't acquire it, they absolutely could, but because they never found it useful). I have a personal story. Growing up I was a math whiz, I loved popular math books, and in particular logical puzzles of all kinds. I learned about Godel's incompleteness from Smullyan's books of logical riddles, for example. I was also fascinated by popular accounts of set theory and incompleteness of the Continuum Hypothesis. In my first year at college, I figured it was time to learn this stuff rigorously. So, independent of any courses, I just went to the math library and checked out the book by Paul Cohen where he sets out his proof of CH incompleteness from scratch, including first-order logic and axiomatic set theory from first principles. I failed hard. It felt so weird. I just couldn't get through. Cohen begins with setting up rigorous definitions of what logical formulas and sentences are, I remember he used the term "w.f.f.-s" (well-formed formulas), which are defined by structural induction and so on. I could understand every word, but it was as if my mind went into overload after a few paragraphs. I couldn't process all these things together and understand what they mean. Roll forward maybe a year or 1.5 years, I don't remember. I'm past standard courses in linear algebra, analysis, abstract algebra, a few more math-oriented CS courses (my major was CS). I have a course in logic coming up. Out of curiosity, I pick up the same book in the library and I am blown away - I can't understand what it was that stopped me before. Things just make sense; I read a chapter or two leis
I think this notion of "mathematical maturity" is hard to grasp for a beginning student. I had a very similar experience. Introduction to (the Russian edition of) Fomenko & Fuchs "Homotopic topology" said that "later chapters require higher level of mathematical culture". I thought that this was just a weasel-y way to say "they are not self-contained", and disliked this way of putting it as deceptive. Now, a few years later I know fairly well what they meant (although, alas, I still have not read those "later chapters"). I wonder if there is a way to explain this phenomenon to those who have not experienced it themselves.
Interesting off-topic fact about Fomenko -- I'd read his book on symplectic geometry, and then discovered he's a massive crackpot). That was a depressing day.
He is a massive crackpot in "pseudohistory", but he is also a decent mathematician. His book in symplectic geometry is probably fine, so unless you are generally depressed by the fact that mathematicians can be crackpots in other fields, I don't think you should be too depressed.
Your point 1 resonates with me. Learning math has steadily increased my effectiveness as a scientist/engineer/programmer. Sometimes just knowing a mathematical concept exists and roughly what it does is enough to give you an edge in solving a problem - you can look up how to do it in detail when you need it. However, despite the fact that life continues to demonstrate to me the utility of knowing the math that I've learned, this has failed to translate into an impulse within me to actively learn more math. Pretty much at any time in the past I've felt like I knew "enough" math, and yet always see a great benefit when I learn more. You'd think this would sink in, you'd think I would start learning math for its own sake with the implicit expectation that it will very probably come in handy, but it hasn't.
Thanks for the thoughtful and insightful comment. I really appreciate it :)

Random thoughts:

  1. The decision that smart high school students should take calculus rather than statistics (in the U.S.) strikes me as pretty seriously misguided. Statistics has broader uses.

  2. I got through four semesters of engineering calculus; that was the clear limit of my abilities without engaging in the troublesome activity of "trying." I use virtually no calculus now, and would be fine if I forgot it all (and I'm nearly there). I think it gave me no or almost no advantages. One readthrough of Scarne on Gambling (as a 12-year-old) gave me more benefit than the entirety of my calculus education.

  3. I ended up as the mathiest guy around in a non-math job. But it's really my facility with numbers that makes it; my wife (who has a master's degree in math) says what I am doing is arithmetic and not math, but very fast and accurate arithmetic skills strike me as very handy. (As a prosecutor, my facility with numbers comes as a surprise to expert witnesses. Sometimes, they are sad afterward.)

  4. Anecdotally, math education may make people crazy or attract crazy people disproportionately. I think that pursuit of any topic aligns your brain to think in a way conducive to that

... (read more)
I agree that basic probability and statistics is more practically useful than basic calculus, and should be taught at the high-school level or even earlier. Probability is fun and could usefully be introduced to elementary-school children, IMO. However, more advanced probability and stats stuff often requires calculus. I have a BS in math and many years of experience in software development (IOW, not much math since college). I am in a graduate program in computational biology, which involves more advanced statistical methods than I'd been exposed to before, including practical Bayesian techniques. Calculus is used quite a lot, even in the definition of basic probabilistic concepts such as expectation of a random variable. Anything involving continuous probability distributions is going to be a lot more straightforward if approached from a calculus perspective. I, too, had four semesters of calculus as an undergrad and had forgotten most of it, but I found it necessary to refresh intensely in order to do well.
"Computational biology," sounds really cool. Or made up. But I'm betting heavily on "really cool." (Reads Wikipedia entry.) Outstanding! Anyway, I concede that you are right that calculus has uses in advanced statistics. Calculus does make some problems easier; I'd like calculus to be used as a fuel for statistics rather than almost pure signaling. I actually know people who ended up having real uses for some calculus, and I've tried to stay fluent in high school calculus partly for its rare use and partly for the small satisfaction of not losing the skill. And probably partly for reasons my brain has declined to inform me of. I nonetheless generally stand by my statement that we're wasting one hell of a lot of time teaching way too much calculus. So we basically agree on all of this; I appreciate your points.
It seems to me that making it mandatory for everyone to learn math beyond percents and simple fractions is even less useful than the old approach of making ancient Greek and Latin mandatory.
When I first read your comment, I thought, "that's not obvious to me". Then a few seconds later I realized: less useful given the opportunity cost of not learning the best possible alternatives. And while math is useful (so are Greek and Latin), there are much better alternatives for mandatory high-school education, basic programming for one.
Exactly. Not sure about programming being any better, though.
Calculus has value for signalling intelligence to colleges. I'm told that for professions (e.g. economists) that do use calculus, real analysis plays more-or-less the same role- a rarely used signal of intelligence.

Some lessons that I've learned from attempting to solve hard and tricky math problems, which I've found can be applied to problem-solving in general: (a) Focus hard and listen to confusions; (b) Your tendency to give up occurs much before the point at which you should give up; (c) Don't get stuck on one approach, keep trying many different approaches and ideas; (d) Find simpler versions of your problem; (e) Don't beat yourself up over stupid mistakes; (f) Don't be embarrassed to get help.

But of course I don't mean to say that learning math is the only way or the best way to learn these techniques.

I agree that math can teach all these lessons. It's best if math is taught in a way that encourages effort and persistence.

One problem with putting too much time into learning math deeply is that math is much more precise than most things in life. When you're good at math, with work you can usually become completely clear about what a question is asking and when you've got the right answer. In the rest of life this isn't true.

So, I've found that many mathematicians avoid thinking hard about ordinary life: the questions are imprecise and the answers may not be right. To them, mathematics serves as a refuge from real life.

I became very aware of this when I tried getting mathematicians interested in the Azimuth Project. They are often sympathetic but feel unable to handle the problems involved.

So, I'd say math should be done in conjunction with other 'vaguer' activities.

Have you noticed any difference between pure mathematicians and theoretical physicists in this regard?
Thanks for pointing me toward the Azimuth Project. I used to follow your "this week" blog for a while, but I must have lost track of it a few years ago. Must have been before this showed up on the radar.
Yes, but it is genuinely the case that imprecision and low quality of answers indicate lower utility of an activity, or lower gains due to mathematical skill. Furthermore, what you are saying contradicts existence of mathematicians who did contribute to philosophy (e.g. Godel). edit: I mostly meant, the stories of such - it seems to me that mathematicians who come up with important insights not so rarely try to apply them.
It doesn't; "many mathematicians avoid..." doesn't imply that all do.
Well, not existence per se, that was a very poor wording on my part, but specific circumstances of their contribution. I think that whenever a mathematician has relevant novel insights, they not so rarely apply it to various relevant problems including 'fuzzy' ones. Or, when they don't, applied mathematicians do. It's just that novel mathematical concepts are very difficult to generate in general and even more difficult to generate starting from some broad problem statement.
I wanted to thank you for this. I read this post a few weeks ago, and while it was probably a matter of like two minutes for you to type it up, it was extremely valuable to me. Specifically a paraphrase of point B, "The point where you feel like you should give up is way before the point at which you should ACTUALLY give up" has become my new mantra in learning maths, and since I do math tutoring when the work's there, I'm passing this message on to my students as well. So, thank you very much for this advice.

(This discussion doesn't distinguish what could be called the rigorous and post-rigorous levels of skill, and so feels a little off (at least terminologically). At the rigorous level, which seems like what you are talking about, you know how the tools work, and can reassemble them to attack novel problems. At post-rigorous level, which seems like a better referent for "learning math deeply", you've sufficiently exercised intuitive mental models to offload most routine observations to System 1, freeing up conscious attention and allowing more ambitious intuitive inferences. Fluency as opposed to competence.)

Thanks Vladimir! Why does my post give the impression of talking about the rigorous level?
You are opposing "learning math deeply" to rote memorization of brittle special cases, but the threshold of being able to work with standard tools (for e.g. understanding technical content of physics/statistics courses) is only rigorous level. Moving further requires additional practice/motivation, when you are already capable of using the tools, and that is not separately discussed in the post.
One (wo)man's brittle special case is another's generalization. There are many different levels of abstraction. One can be at the rigorous level on some dimensions and at the post-rigorous level on others. In the other direction, many things that once required post-rigorous thinking are now sufficiently codified so that they now require only rigorous thinking. There's not a well-defined body of "standard tools."

I'd start with an anecdote from the local practice here, with regards to learning math shallowly vs with an understanding from the grounds up:

It is fairly common to derive supposed ultra low prevalences of geniuses in populations with lower mean IQs.

For example, an IQ of 160 or more is 5 SDs from 85 , but 4SDs from the 100 , so the rarity is 1/3,483,046 vs 1/31,560 , for a huge ratio of 110 times the prevalence of genius in the population with the mean IQ of 100.

This is not how it works; the higher means are a result of decreased prevalence of negative co... (read more)

I heard somewhere that IQ scores are normally distributed by definition, because they are calculated by projecting the measured rank onto the normal distribution with mean 100 and stddev 15. Can't seem to find a reference on Wikipedia though, so maybe that's not true.
IQ distributions are calibrated based on a reference sample, such that the reference sample has mean 100 and std 15 and follows a normal distribution. I believe the reference sample is generally British nationals or European Americans, so that interracial comparisons are sensible. That doesn't mean that the distribution of all test-takers follows a normal distribution with mean 100 and std 15.
Precisely. If you are looking at some third world nation, well, there's all those kids who have various nutritional deficiencies, their IQs are impaired. The mean is lowered considerably, but that's through introduction of extra variables into the (approximate) sum. If you don't take that into account and assume that only the mean in the distribution has changed, you get entirely invalid results at the high range due to how rapidly the normal distribution falls off far from the mean (as exponent of a square). For example if you were to calculate number of some rare geniuses out of the reference population (say, 300 millions with mean of 100 and standard deviation of 15), and from the world population assuming some lower mean and same standard deviation, for sufficiently rare "genius" you'll get a smaller number of geniuses in the whole world than in that one reference population (which is ridiculous). edit: which you can see by noting that this with c smaller than b grows as x grows (i.e. ratio of prevalences between two populations grows with distance from the mean).
The example I'd give here is India, where you have lots of mostly distinct ethnic groups, and so it's reasonable to expect that the true distribution is a mixture of Gaussians. Knowing the Indian average national IQ would totally mislead you on the number of Parsis with IQs of 120 or above, if all you knew about Parsis was that they lived in India. (It's not clear to me that malnourishment leads to multiple modes, rather than just decreasing the mean while probably increasing the variance, because I think damage due to malnourishment is linear, and it's probably the case that many different levels of severity of malnourishment are roughly equally well represented.)
In the limit, the mixture of Gaussians is a Gaussian. Theoretically, malnourishment (given that only a part of the population suffers from it) should lead to a negatively skewed distribution. And yes, with a lower mean and higher variance.
Nope. The sum of Gaussian random variables is a Gaussian random variable, but a mixture Gaussian model is a very different thing. (In particular, mixture Gaussians are useful for modeling because their components are easy to deal with, but if you have infinite mixtures you can faithfully represent an arbitrary distribution.) Yep, I should have mentioned that also.
Yes, you are correct, I got confused between a sum and a mixture.
Not everyone's malnourished, though - a significant number of people are into diminishing returns, nutrition wise. It's very nonlinear in the sense that as long as there's adequate nutrition, it plate-outs - access to more nutrition does not improve anything.
Sorry, is your claim that IQ does not follow a normal distribution in the general population?
It seems likely to me that this is actually the case, especially when you look at the tails, which is what he was discussing. The existence of things like Down's syndrome means that the lower part of the tail certainly doesn't look like you would expect from a solely additive model, and that might also be true at the upper end of the distribution. (It's also much more likely to be the case if you want to use some other measure of intelligence which is scaled to be linear in predictive ability for some task, rather than designed to be a normal distribution.)
This should be straightforwardly testable by standard statistics. Given the empirical distribution of IQ scores and given the estimated measurement error (which depends on the score -- scores in the tails are much less accurate) one should be able to come up with a probability that the empirical distribution was drawn from a particular normal. Although I don't know if I'd want to include cases with clear brain damage (e.g. Downs) into the population for this purpose.
Agreed. If you have a source for one of these, I would love to see it. I haven't been able to find any, but I also haven't put on my "I'm affiliated with a research university" hat and emailed people asking for their data, so it might be available. Agreed that this should be the case, but it's not clear to me how to estimate measurement error besides test-retest variability, which can be corrupted by learning effects unless you wait a significant time between tests. I think Project Talent only tested its subjects once, but unless you have something of that size which tests people during adulthood several times you're unlikely to get sufficient data to have a good estimate here.
That may require prohibitively large sample sizes, i.e. not be testable. With regards to measuring g, and high IQs, you need to keep in mind regression towards the mean, which becomes fairly huge at the high range, even for fairly strongly correlated variables. Other more subtle issue is that proxies generally fare even worse far from the mean than you'd expect from regression alone. I.e. if you use grip strength as a proxy for how quick someone runs a mile, that'll obviously work great for your average person, but at the very high range - professional athletes - you could obtain negative correlation because athletes with super strong grip - weightlifters maybe? - aren't very good runners, and very good runners do not have extreme grip strength. It's not very surprising that folks like Chris Langan are at very best mediocre crackpots rather than super-Einsteins.
At least for certain populations the sample sizes should be pretty large. Also a smaller-than-desired sample size doesn't mean it's not testable, all it means is that your confidence in the outcome will be lower. Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.
And it seems to me that having studied math complete with boring exercises could help with understanding of that somewhat... all too often you see people not even ballpark by just how much necessary application of regression towards the mean affects the rarity.
Now that I've started to think about it, the estimation of the measurement error might be a problem. First we need to keep in mind the difference between precision and accuracy. Re-tests will only help with precision, obviously. Moreover, given that we're trying to measure g, it happens to be unobservable. That makes estimates of accuracy somewhat iffy. Maybe it will help if you define g "originally", as the first principal component of a variety of IQ tests... On the other hand, I think our measurement error estimates can afford to be guesstimates and as long as they are in the ballpark we shouldn't have too many problems. As to the empirical datasets, I don't have time atm to go look for them, but didn't US Army and such ran large studies at some point? Theoretically the results should be in public domain. We can also look at proxies (of the SAT/GRE/GMAT/LSAT/etc.) kind, but, of course, these are only imperfect proxies.
In any population other than the one for which the test has been normed to follow a normal distribution with mean of 100 and standard deviation of 15, yes, results need not be normally distributed or to have a standard deviation of 15. When discussing a population with a mean IQ other than 100, it is automatically implied that it is not the population that the test has been normed for.
Do you have any psychometric lit. pointers on cases where e.g. normal goodness of fit tests fail? Is this just standard knowledge in the field?
So, one of the known things is that standard deviation varies by race. For example, both the African American mean and variance are lower than the European American mean and variance. To the best of my knowledge, few people have actually applied goodness of fit tests to IQ score distributions to check normality.
I don't understand why this is relevant.
Hm. When I read the great-grandparent earlier, I got the impression it would be helpful to corroborate this claim in the great-great-grandparent: Rereading the great-grandparent now, it's not clear to me why I got that impression. (I may have been thinking that the "general population," as it contains distinct subpopulations, will be at best a mixture Gaussian rather than a Gaussian.) I do agree that private_messaging's claim- that the ratio we see at the tails doesn't seem to follow what would be predicted by the normal distribution- hinges on the right tail being fatter than what the normal distribution predicts. (The mixture Gaussian claim is irrelevant if you've split the general population up into subpopulations that are normally distributed, unless the low IQ group contains subpopulations, so it isn't normally distributed. There's some reason to believe this is true for African Americans, for example, if you don't separate out people by ancestry and recency of immigration.) The data is sparse enough that I would not be surprised if this were the case, but I don't think anyone's directly investigated it, and a few of the investigations that hinge on the thickness of the tails (like Sex Differences in Mathematical Aptitude, which predicts female representation in elite math institutions by looking at the mean and variance of math SAT scores of large populations) seem to have worked well, which is evidence for normality.
Incidentally, is there even any empirical evidence that intelligence is normally distributed in any concrete sense?
I don't think any existing measure could be Gaussian with any sort of accuracy at tail ends, because there you need too large sample size to norm the test & generally, the approximate Gaussian you get due to many random additive factors deviates by huge factors from Gaussian at the tail ends. Bulk of norming of a test comes from average people. Ditto for correlations between IQ and anything. Bulk of reported correlation comes from near the mean.

It's interesting that the comments on this post are split in terms of whether they interpret the focus to be on math or on deeply. It's also worth noting that the term "deeply" has many different connotations. Stephen Chew, whom you link to, is using deep learning in the sense of learning something by pondering its meaning and associations. But it's very much possible for an unsophisticated to learn something deeply in the Chewish sense without acquiring a conceptual understanding of it that has transferable value. For instance, one might "d... (read more)

I used a similar approach to learning chemistry at university level (undergraduate to PhD level, although my PhD drifted a bit from pure chemistry into computing and education). There were lots of situations where, to solve a problem, you needed the appropriate applied formula. Many (most?) students tried to memorise these formulae and the situations they applied in. I struggled to memorise them, so instead focused on how to derive the applied formula from a much smaller set of basic equations. Often there's a mental trick that makes it easier - e.g. to de... (read more)

I was very successful in my early mathematical education. I'd get As with ease, take exams early, enter mathematics competitions, etc. I had a deep understanding despite doing very little work because all the concepts seemed obvious.

I continued in the exact same way and my performance declined to the point where I was struggling to get Cs. I was now meeting concepts that were not intuitively obvious (eg. limits, proofs, complex numbers), and because of my previous success I had not developed any techniques to gain deep understanding of them. I lost all sen... (read more)

A counterpoint to the "developing better reasoning skills" point above: it's known that transfer of learning from one domain to another is often very low.

In my anecdotal experience, math is the most transferable of all skills I've learnt.

Add physics to that.
One of the barriers I run into when I delve into physics is that I have a very rationalist approach to math. I hate terminology and I want as little of it as possible in my reasoning. Physics has rather high barriers in that way in that academic physicists don't really like mathematical rigour, and don't precisely specify, say, the abstract algebraic axioms of the structures they are using. But when I get to a point of being able to specify what structure is behind a physical theory, I can usually intuit it readily. Physics is domain knowledge compared to mathematical reasoning ability.
What does this mean? You only attack problems with high VoI?
I have a bad habit of stating things and then explaining them. I meant it is rationalist in that I: * Hate terminology. Give me axioms, definitions and theorems; then we can discuss them in words later. * Build up my intuitions, and especially weed out the useless ones. I don't really do proofs if it is not necessary, and sometimes I even skimp on the formal details; using my connectionist intelligence to it's full potential. * I try to explore as much as possible, and look for people to learn from. Proving things is a question of strategy, and many a Nobel laureate has had mentors who were Nobel laureates too.
How are those things particularly rationalist? Sounds to me you're just using the word in some inflationary sense.
The Humans Guide to Words sequence and the concept of "words should refer to something" pertains to the first item. The Quantum Mechanics sequence and the concept of "It all adds up to normality" pertains to the second item. The third is based on an inversion of the idea behind the Sequences in general, that I need giants to stand on the shoulders of, and I forget exactly where it says that the most valuable skills in maths are non-verbal. These three points I have on reflexive gedankenexperiment and discourse with more experienced CS and mathematics students, attempted to disprove and I have found that this is difficult, long winded and that the counterarguments are weak. I also recognize that maths have tremendous instrumental value in the work I plan to do in the future. All of this is basic bayesian skills, and I have met several people, CS, maths and physics students who were doing things adverse to understanding maths, which could be fixed by implementing any of the above strategies.
The word "rational" does not mean "was discussed in the Sequences" and certainly doesn't mean "was analogous to something that was discussed in the Sequences". I relish the irony of your belief that "words should refer to something" when you readily inflate the meaning of "rational" and "bayesian". This indicates to me that you've assumed I'm criticizing the substance of your advice. This is a false assumption.
Great. Now you have really confused me. Do we agree that you can implement more or less winning strategies as a member of the species of homo-sapiens, congruent with the utility-concept of 'making the world a better place' , and that there is an absolute ranking criterion on how good said strategies are? Do we agree that a very common failure mode of homo sapiens is statistical biases in their bayesian cognition, and that these biases have clear causal origin in our evolutionary history? Do we agree that said biases hamper homo sapiens' ability to implement winning strategies in the general case? Do we agree that the writings of Eliezer Yudkowsky and the content of this site as a whole describe ways to partially get around these built in flaws of homo sapiens? I am fairly confident that a close reading of my comments will find the interprentation of 'rational' to be synonymous with 'winning-strategy-implementation', and 'bayesian' to be synonymous with (in the case that it refers to a person) 'lesswrong-site-member/sequence-implementor/bayes-conspiracist' or (in the case that it refers to cognitive architectures) 'bayesian inference' and I am tempted to edit them as such.
I am nonplussed at your attempt to lull readers into agreeing with you by asking a lot of rhetorical questions. It'd have been less wrong to post just the last paragraph: The missing link in the argument here is how your examples are, in fact, winning strategies. You claimed some superficial resemblance to things in the sequences, and that you did better than some small sample of humans. I disapprove of this expanded definition of "bayesian" on the basis that it conflates honest mathematics with handwaving and specious analogies. For example, "it all adds up to normality" is merely a paraphrase of the correspondence principle in QM and does not have any particular legislative force outside that domain.
I'll concede the point, partially because I tire of this discourse.
If mathematical details matter, they should be specified (or be clear anyway - e.g. you don't define "real numbers" in a physics paper). Physics can need some domain knowledge, but knowledge alone is completely useless - you need the same general reasoning ability as in mathematics to do anything (both for experimental and theoretical physics). In fact, many physics problems get solved by reducing them to mathematical problems (that is the physics part) and then solving those mathematical problems (still considered as "solving the physical problem", but purely mathematics)
Logic even more so.

Up until University where I am now I have never actually had to think hard about math problems I was presented with.

Last summer I had an epiphany in abstract algebra, and it has been hugely beneficial to see these structures everywhere in computer science. That is a lot of handy theorems you get for free.

I think that high-level pattern maching strategies are very valuable. Category theory, abstract algebra, etc.

I don't remember a period of my life where I didn't feel like I had a deep understanding of math, and so it's hard for me to separate out mathematical ability and cognitive ability.

I've also seen advice from a handful of places I respect to learn as much math as you can stand, because there often is transfer from mathematical topics to practical applications. This is much more true for engineers, physicists, and software developers than it is for people in other professions, but still suggests that the first negative consideration you raise is strong (unle... (read more)

I'd be interested in hearing more about your experience. A lot of smart people don't develop a deep understanding of math because that's not how the subject is taught and because they don't have the initiative to try to work things out themselves. With this in mind, to what do you attribute your success?
Hope this isn't too off-topic, but I wonder if you have any ideas about why that is. The main impediment to many far-mode thinkers learning hard (post-calculus) math is the drill and drudgery involved. If you're going to learn hard math, it seems you should, by all means, learn it deeply. That's not the obstacle. The obstacle is that to learn math deeply, you must first learn a lot of it rotely--at least the way it's taught. In the far-distant past, when I was in school, learning elementary calculus meant rote drilling on techniques of solving integrals. Is this still the case? Is it inevitable, or is it the result of methods of education? The main reason "smart people" avoid math isn't that they want to avoid depth; rather, what is, at least for some of them, drudgery. Math, more than any subject I know of, seems to require a very high level of sheer diligence to get to the point where you can start thinking about it deeply. Is this inevitable?
I think that the point is that more people are capable of routine tasks than of conceptual understanding, and that educational institutions want lots of people to do well in math class on account of a desire for (the appearance of) egalitarianism. What time period was this? (No need to answer if you'd prefer not to :-) ) Some diligence is necessary, but not as much as it appears based on standard pedagogy. I wish that I could substantiate this in a few lines. If you say something about what math you know/remember, I might be able to point you to some helpful references.
Some degree of this is probably inevitable. Integration in particular has no closed solution (unlike differentiation), so there really is no one general method you can apply to all problems. All you can do is remember a bag of tricks. While for differentiation, a few general rules allow you to integrate all elementary and trigonometric functions, and that's pretty much all you encounter in school.
Well, looking back I have to attribute a lot of my perception of success with blindness, in the sense that 5-6 year old me thought he was a hot math talent because he knew about integers when the teacher was teaching the class about natural numbers. (I still remember raging against the claim that the right answer was "you can't subtract 3 from 2!" instead of "negative 1!") From what I can tell from looking at curriculum online, that's ~5 years ahead of schedule but I'd interpret that as the curriculum putting it late (though, on reflection, that could be Dunning-Kruger). I remember jumping ahead of (well, deeper than- below?) the curriculum frequently, and suspect that it had different causes in different circumstances. Rapid calculation is probably just high g, but rapid perception of concepts and connections probably has something to do with intuition or vision that I find difficult to articulate. I've also never been particularly good at explaining why I know what I know with regards to math- from refusing the step through the algebra when I could solve a problem in my head, to avoiding college classes which were primarily about proving that methods worked (i.e. calculus the second time around) rather than introducing new methods. I have, through deliberate practice, gotten better at writing proofs in the last year or two, but still regularly come across simple theorems where I say "I know X is true, but don't know how to show X is true." I do think I would have been more successful in a Moore method environment which is designed to teach a deep understanding of mathematics- it seems likely to me_now that me_past would have learned/wanted to care about rigor much earlier in that sort of environment, and would have kept pushing my math boundaries much more uniformly.

Deduction and analogy seem like largely different reasoning processes. I suspect that what you're describing is that by learning the notation and doing enough deductive arguments, the tasks begin to become intuitive, that is, they begin to become analogical and not deductive.

Deductive thinking is conscious, deliberative, and "slow." Analogical and intuitive thinking is unconscious, nondeliberative, and "fast." So you're probably right that by learning to relegate many mathematical tasks to analogical thinking, one increases their ef... (read more)

Perhaps this comes under "Reduced need for memorization" but when someone says "deeply" I assume they mean understanding the underlying principles - specifically understanding the limitations of the tools being used:

An extremely trivial example might be how often people in businesses communicate using measures of central tendency (mean) but almost never talk about spread (standard deviation). Yet the SD is as important as the mean.

Perhaps less trivial might be that the analysis of small samples (N < 50) often use T-Statistics. This... (read more)

Learning math deeply is better than not taking math courses at all, which is better than memorizing some formulas for the exam and forgetting them afterwards.

The analogous statements are true of every other field.