# Foreword

This book has been reviewed pretty thoroughly already. Rather than restate each chapter, I'll be sharing insights: some book-specific, some general.

I am quite proud of my time-to-completion for this book - just over a week, working around a very strenuous courseload. I went from having to focus really hard to pick up new concepts to reading notation nearly as fluently as the English surrounding it. The chapters started making sense - it felt less random, and more like a coherent story wherein the protagonist slowly adds owers to their arsenal.

# Naïve Set Theory

## Functions

Functions are *just* **static sets** of ordered pairs . They are *not* dynamic indexing functions, they do *not* perform efficient lookup, please do *not* waste an hour of your life trying to figure out how you could do such a thing within the simple machinery afforded by set theory up to that point.

This is one of those things that Nate talked about - how skipping over just one word a few chapters prior can cause you to waste hours. During my confusion, I knew this was probably the case, but I still couldn't manage to immediately overcome my intuitions of what a function should be. This is one reason I'm open to working through the MIRI Research Guide with others.

## Families

Families are, ironically enough, just a special kind of function; don't let your intuition fool you - they aren't "groups of functions". A family belonging to maps each element of the index set to an element . For example, a family from to could be (thanks to Dacyn for helping me clarify my writing).

## Zorn's Lemma

I spent three hours staring at this proof. I understood what ZL meant. I grasped the relevant concepts. I read other versions of the proof. I still spent three long hours on this damn proof, and then I went to my classes. I don't know why I ended up figuring it out, but I suspect it was a combination of two factors: my brain worked through some things during the day, and I *really wanted it*. On the bus home, I mentally snapped and decided I was *going to understand the proof*. And I did.

I'm pleased to share my detailed proof outline of Zorn's Lemma, the product of many hours of ambient exasperation, rewritten in my own notation. Looking back, the proof in the book was pretty bad; it was neither succinct nor intuitive, but instead imposed a marais of mental variable tracking on the reader. I think mine is at least a little better, if not fully fleshed-out at all junctures.

## Proof Calibration

As someone without a formal degree in mathematics, it was important for me to monitor how I approached the exercises in the book. Whenever the author began a proof, I tried to generate a mental proof sketch before reading further. Sometimes, I thought the proof would be easy and short, but it would turn out that my approach wasn’t rigorous enough. This was valuable feedback for calibration, and I intend to continue this practice. I'm still worried that down the line and in the absence of teachers, I may believe that I've learnt the research guide with the necessary rigor, go to a MIRIx workshop, and realize I hadn't been holding myself to a sufficiently high standard. Suggestions for ameliorating this would be welcome.

# Forwards

## Anticipation

One factor which helped me succeed was that I ensured my morning reading was what I most looked forward to each day. I was excited to go to sleep, wake up early, prepare a delicious breakfast, and curl up near the fireplace with book and paper handy. Trivial inconveniences can be fatal - do whatever you must to ensure you properly respect and anticipate your study time.

## Defense with the Dark Arts

The most useful productivity-related advice I ever read was by Nate Soares (**Dark Arts warning**), and it relates to imbuing your instrumental goals with terminal values. Ever since having read that advice, every tedious assignment, every daily routine, every keystroke - they're all backed up by an intense desire to *do something* about the precarious situation in which humanity finds itself.

## Internal Light

If you don't know where to start, I think the internal fire has to be lit first - don't try to force yourself to do this (or anything else) because you should. Stop the guilt-based motivation, proudly stake out what you want, and transform your life into a dazzling assortment of activities and tasks imbued with your terminal values, your brightest visions for yourself and the future.

Awesome. In general I think posts about learning math are super good and probably most people should be learning more math, both for general edification and to improve their epistemic rationality in the direction of their ability to form gears models.

I'll also repeat an earlier offer: it would be insanely low effort for me to write posts explaining math to LWers who want math explained, so I'm very happy to take requests for such posts.

Not quite sure what this means, but I've seen people make similar mistakes before, e.g. occasionally I see people ask questions like "are the values of a function computed sequentially or all at once?" and the answer is "type error," more or less.

One way to think about it is that functions in mathematics are "Platonic lookup tables" (I prefer "Platonic" to "static" because in mathematics the distinction between static and dynamic is a matter of perspective and it's useful to switch between the two); given an input there's some Platonic fact about what the output is, and that's all a function is. In general most of pure mathematics is implicitly Platonist in some sense and you'll have an easier time with it coming at it from that angle.

My most basic math misunderstanding in school went something like this. Take an equation like "x+1=3". You can subtract 1 from by both sides and get "x=2". But why can you subtract 1 from both sides? Like, if two things are "equal", why can you subtract 1 from both of them and the results will still be "equal"? What does it mean for different things to be "equal" and why does it respect certain operations?

Luckily, my school had math teachers with a clue. They didn't dismiss my question or answer it outright. Instead they agreed it was a meaningful question that I should solve myself. Eventually I figured it out: expressions are "equal" if they point to the same platonic thing (say, a number within the set of real numbers). When you subtract 1 from both sides of an equation, you're subtracting 1 from the same platonic thing, so you end up with another platonic thing with two pointers pointing at it.

Shortly after that, emboldened, I approached my physics teacher with another basic confusion: are physical objects open or closed sets? If they are closed (include their border points), how do they touch? And if they are open (don't include their border points), how do they touch? My physics teacher was also pretty good and replied that we don't know, but it's not very relevant to touching, because what we call touching is actually interaction at a distance.

I've had many such confusions since, while learning all kinds of things. These confusions seem hard to predict and are rarely emphasized in textbooks, probably because they are personal: everyone has different sticking points and different magic words for overcoming them. That's part of the reason why explaining stuff to a specific person who can ask questions back feels more productive than writing for a broad audience.

Thanks for the generous offer! What kind of requests would you like to get? Specific questions? Certain subjects? etc.

In any case, I'm going to write a bunch of stuff I'd love to have explained more thoroughly, some general, some more specific, if you can explain any of them that would of course be amazing from my point of view. Most of these are just things that came up for me while (re)learning a bunch of math.

all the time, I still don't have an intuitive sense of what it does.Thanks!

I'll have a go at 4. (I don't know what is and isn't in your brain, so this may be entirely unhelpful. Even if so, it may help make it clearer what would be helpful.)

I think a better perspective on induction is that "regular induction" is just a special case of transfinite induction.

Ordinary induction over the natural numbersWhen you first learn about induction, it's usually presented something like this: "To prove that P(n) for all n, prove that P(0) and that P(n) => P(n+1)." But it's not that uncommon to have induction proofs that work a slightly different way -- e.g., going from P(n) to P(2n) and P(2n+1) -- and a better way of looking at it is this: "When proving that P(n), you are allowed to assume that P(m) for all m<n".

Any proof using "n->n+1" induction is already a proof using this more powerful induction principle. (Because assuming P(n-1) is just part of what it means to assume P(everything less than n).)

Why does this more powerful principle work? Because if P(n) fails then P(m) fails for some smaller m, and then P(l) fails for some still smaller l, and so on, and you get an endless sequence of smaller and smaller numbers -- but these are all positive integers, so the sequence can't go on for ever, contradiction.

So the key fact is that

there are no infinite decreasing sequences.I find it helpful to think about this in terms of how the natural numbers are (or can be)

constructed. 0 is a natural number; every natural number has a successor; and that's all of them. And in some sense this is why there are no infinite decreasing sequences: because each natural number is built by a finite increasing sequence.Other varieties of inductionNow, the same thing works in other contexts where you have a

kind of object(above, it's natural numbers) and anotion of comparison(above, it's the < relation) for whichthere are no infinite decreasing sequences.(You often want to think of < as meaning "is simpler than".)

These tend to be objects with a recursive construction similar to that of the natural numbers. For instance, binary trees. The empty tree is a binary tree; if L and R are binary trees then the tree consisting of a root node whose children are L and R are binary trees; and that's all of them. So now you can do induction over binary trees: try to prove P(T)

allowing yourself to assume P(T') whenever T' is a subtree of T. Or Lisp-style lists. The empty list is a list; if L is a list and x is a thing-that-isn't-a-list, then there's a list with x as head and L as tail; and that's all of them. Now you can do induction over lists: to prove P(L), allow yourself to assume that P(tail(L)).In these cases, you can turn this "structural induction" into ordinary induction over natural numbers: use the depth of the tree, or the length of the list. But there are other cases, and we'll see one now.

Transfinite induction over sets and ordinalsTake our objects to be sets. And take < to mean "is an element of, or an element of an element of, etc.".

Why are there no infinite decreasing sequences? Because there are no infinite chains that look like ⋯∈x∈y∈z, going on for ever on the left. Why's that? Depending on how you look at it, it's (1) because there's an axiom (the Axiom of Foundation) that forbids it, or (2) because sets are constructed recursively, starting from the empty set, and this recursive construction guarantees it. (If you try to turn that into an actual proof, you end up back with the Axiom of Foundation, which is what guarantees that you

canthink of sets as being constructed recursively.)Take our objects to be ordinals. And take < to be the usual ordinal comparison relation.

Why are there no infinite decreasing sequences? Because the ordinals are well-ordered by <. Why's that? Because (at least with the usual construction in ZF(C)) any set of ordinals is a subset of a larger ordinal, with the < relation

betweenordinals corresponding to the < relationwithinthat larger ordinal.(These examples

can'tbe translated into ordinary integer inductions: there are far more ordinals than integers, and far more "ranks" of sets than integers. But we still have the no-infinite-descending-chains property!)These are all the sameAll of these work the same way. You have a kind of object (natural numbers, trees, lists, sets, ordinals), a notion of being smaller/simpler (comparison of integers/ordinals; subtree; tail of list; element of set), and a theorem or an axiom that guarantees that you can't keep getting smaller/simpler for ever. And, in that context, anything that's

true of X whenever it's true of everything simpler than Xis true of all X. And that's what induction really is.It seems a little odd to appeal to no infinite decreasing sequences (which are much more complicated objects than integers) to justify induction. The fundamental intuition is that if we start out with things with a certain property, and only do operations that preserve the property, then we'll never end up with anything that doesn't have the property. (Of course in the set-theoretic case "operations" includes things like "take the set of all objects that we've constructed so far, even if it is infinite".)Without this intuition it's hard to see why it should be the case that there are no infinite decreasing sequences.

I guess for natural numbers, the other intuition is that if a property holds for some number, then there should be a "first time" that it holds -- just count upwards and check whether the property holds for each number, and stop once you get to the first one. Of course you can make this into an equivalent intuition about transfinite numbers, though it becomes a little less intuitive in that case.

I agree that that intuition is very helpful; that's the point of my paragraph beginning "I find it helpful to think ...". But what distinguishes things you can do induction on from things you can't is exactly the no-infinite-descending-sequences property (or the equivalent property that "any nonempty set has a smallest element", which is in some sense simpler but to my mind less intuitive).

Hmm, it seems like "why is X true?" can have two different meanings, basically "what is the reason that X is true?" versus "what is a convenient fact about X that guarantees that it is true?" I guess you were answering the second question and I was answering the first.

But actually with regards to your question, I think it is also true that whether you can view a collection as being generated by a process that creates new elements from old elements is also a distinguishing feature of whether you can do induction on the collection. If you want to make it into a formal statement in ZFC (and for the record I don't, since I don't think that ZFC includes the right formalism in which to make such a statement), you could say that you can do induction on a poset (S,<) if and only if there exists a set R⊆P(S)×S (where a pair (A,x)∈R is thought of as representing a rule of the form "if all elements of A are in our set, then add x to our set") such that (a) for all (A,x)∈R and y<x, there exists z∈A such that y≤z, and (b) S is the smallest set B⊆S with the property that for all (A,x)∈R, if A⊆B then x∈B.

I guess both points of view can be useful.

In the case of pure mathematics, I think "the reason X is true" is often ill-defined; there will be different ways of thinking about the things involved in X, and they will lead to different answers to "why is X true?".

(Is the prime number theorem true

becausethe zeta function has no zeros on Re(s)=1, or does the zeta function have no zeros there because of how the prime numbers are distributed? Or is PNT perhaps true because of Selberg's "fundamental formula", which can be used to give a proof of PNT that makes no mention of complex analysis or the zeta function -- or vice versa? My inclination is to say that those high-powered theorems we use to prove PNT are truebecausethe prime numbers are distributed as they are -- but we are then in the uncomfortable situation that all known ways ofprovingthat they're distributed that way go via these high-powered theorems, so X is true because Y is true but we can only prove X by proving Y first.)It seems to me that "because" language in pure mathematics is co-opting a bunch of intuitions that were formed by considering cause-and-effect in the physical world. Causation is a tricksy notion even in the physical world, and it's worse still in pure mathematics.

I agree (I think; it's easy to go astray in this stuff) that your "upward" induction principle is equivalent to the formulation in terms of well-orderings. For what it's worth, I think your principle works "because" we can take R = { (A, smallest thing not in A) } rather than thinking that the formulation in terms of well-orderings works "because" it guarantees we can think of making new elements from old; it's interesting how different things are more-intuitive to different people :-).

I'm not assuming that my poset is totally ordered, so there isn't always a "smallest thing not in A". For example think of the binary trees you mentioned above, with ancestry as the order relation.

I get the impression we do still have a substantive disagreement, so I'll taboo "because" and instead ask the question: how are the natural numbers built? Or in other words, what does it mean to say that something is a natural number? I would say that it just means that it is one of the objects in the collection you get if you start with 0, and each time you have n then you add n+1 to your collection. A book such as the one discussed in the OP, however, will give a different answer: a natural number is an element of the intersection of all sets that contain 0 and are closed under the operation n↦n+1. These two definitions are arguably equivalent -- but such an equivalence cannot be proven in a language like ZFC, since the only way ZFC has of formalizing the first definition is to beg the question and encode it according to the second definition.

In a sense all properties of a mathematical object are "because" of the way that it was built. So perhaps our different intuitions about "because" are based on different ideas of what it means to be a natural number? (Or perhaps not...)

Regarding the prime number theorem: I'm not an expert so I don't have an opinion on which way we should say that the causality goes in that particular case, but I do think a lot of the time it is useful to ask such questions, and that giving good answers (even if they aren't the type of thing that can be rigorous) can help one understand a subject better.

I think it's highly debatable whether the natural numbers are

builtat all. Arguably they're justthere(in some sense). One can construct particular "implementations" of the natural numbers, and there are many ways to do it; for instance, the usual way to do it in NF is a Frege-like construction: natural numbers are equivalence classes of finite sets under the relation "can be put in bijection with"; "finite" means "can't be put in bijection with any finite subset". (You can't do that in ZF(C) because there are too many finite sets, but perhaps you can do it in a system that adds proper classes, like NBG.)I don't have strong feelings about how the natural numbers "should" be built, or what they "really" are. I'm happy thinking of them as "sizes of finite sets", or as the things you get if you start at 0 and add 1 as often as you like (though there's a certain circularity about that definition), or even as finite sequences of bits (though, again, there's some circularity there). I don't think it's coincidence that these all lead to the same theorems, but I don't feel any particular need to pick one definition and say that the others are all somehow parasitic on it.

Incidentally, when Frege came to define the natural numbers (this was in 1884, a few years before the usual Peano axioms were formulated, and I think he was the first person to do anything of the kind) he did it by (1) defining cardinal numbers as equivalence classes of sets under the same-size relation, and then (2) saying that a natural number is anything you can get to 0 from by counting downwards. Make of that what you will.

The more specific the better, basically.

It's unclear to me what counts as a "geometric" interpretation here. The coordinate-free interpretation is that taking transposes is the coordinate-dependent version of taking the adjoint of a linear operator between inner product spaces. There's also an interpretation in terms of SVD which may be more explicitly geometric: taking transposes swaps the left and right singular vectors. (This is one way to think about the spectral theorem: it means a symmetric matrix is a matrix whose left and right singular vectors agree, which means they're eigenvectors. This isn't quite a proof because singular vectors aren't quite unique but it's pretty close.)

I wrote a blog post about SVD; not sure whether it will give you what you want but I like it.

As gjm says, ordinary induction is a special case of transfinite induction so it's not exactly a whole new kind of induction, just a (pretty mild, all things considered) generalization.

"Geometric" intuition is basically the way that the 3Blue1Brown YouTube channel would explain things. I'm not sure if you're aware of it, but their "Essence of Linear Algebra" goes through the broad high-level concepts of linear algebra and explains them, with a very visual/geometric intuition for things like basis change, inverses, determinants, etc.

Unfortunately, they never covered transpose :)

Also, I'll take a look at your blog post, thanks!

I second this; I too would appreciate a Qiaochu’s-eye view. Also, are there hidden dependencies in the MIRI reading list? I feel like some subjects would be most profitably studied before or after others - category theory is mentioned as helping you understand the underlying structures of mathematics and generalize your understanding - so does that mean read early, or read near completion of the list?

As Gram Stone says, most people think category-theory-first is a bad idea. I agree that it won't suit most people. My recommendation is to learn category theory

simultaneouslywith everything else, and also to learn as much of it as seems helpful for understanding whatever else you're learning, and no more; this is what I did. For example, I learned about adjoint functors as I was doing enough abstract algebra to run into interesting examples of adjoint functors, such as the induction and restriction functors on group representations. I never went through a category theory textbook (in general I mostly learn from blog posts, Wikipedia, the nLab, etc.) and so never learned things that didn't seem useful for something else.In general I'm a big fan of learning many fields simultaneously, so it's easier to see connections between them. The relevant dependencies don't parse into subjects for me; they're smaller conceptual chunks like "understand, in any of the places where it appears, the concept of currying, or else you won't understand a ton of things like how to pass between the two standard description of group actions, or why the double dual of a finite-dimensional vector space is naturally the same vector space again." (It took me a few hours of frustrated thinking to really grok this and once I did I was able to use it smoothly everywhere it appeared forever.)

I like

Conceptual Mathematicsa lot.Re: category-theory-first approaches; I find that most people think this is a bad idea because most people need to see concrete examples before category theory clicks for them, otherwise it's too general, but a few people feel differently and have published introductory textbooks on category theory that assume less background knowledge than the standard textbooks. If you're interested, you could try Awodey's

Category Theory(free), or Lawvere'sConceptual Mathematics. After getting some more basics under your belt, you could give either of those a shot, just in case you're the sort of person who learns faster by seeing the general rule before the concrete examples. (These people exist, but I think it's easy to fall into the trap of wishing that you were that sort of person and banging your head against the general rule when you really just need to pick up the concrete examples first. One should update if first-principles approaches are not working.)I'd also add to 2: an intuitive explanation of eigenvectors and their properties in general. They seem to pop up everywhere in Linear Algebra and when then do, people gesture towards a set of intuitions about them that I haven't managed to pick up.

The first secret is that eigenvectors are not really the point; the fundamental concept is eigenspaces, and the point of eigenvectors is to furnish bases for eigenspaces. Eigenspaces are unique, but you have to make choices to pick out a basis of eigenvectors.

An eigenspace for a linear operator T:V→V is a subspace W⊆V which is T-invariant (meaning that TW⊆W) and on which T acts as multiplication by some scalar λ (the "eigenvalue"). You would love to have such a thing if it existed, because multiplication by a scalar is super simple to understand (for example, it's super easy to compute powers of T acting on W since you just take powers of λ). But there's no reason a priori to expect that such things exist.

The miracle is that if V is finite-dimensional and we're working over an algebraically closed field such as C, then T always has a nontrivial eigenspace, and in fact most of the time V is the direct sum of the eigenspaces of T. (In general V is the direct sum of the generalized eigenspaces of T; this is a slightly weaker version of the existence of Jordan normal form.)

In other words, most of the time you can split V up into a bunch of directions such that T acts by scaling each of those directions (this intuition is complicated by the fact that we're working over a field like C but as a first pass this isn't bad). I want to underline that this is a miracle; there was no reason a priori that linear operators on finite-dimensional vector spaces should behave this simply, and in the infinite-dimensional case things are much more complicated.

I don't know if that addressed any of your confusions; as I said above, more specific questions are better.

On 3: when doing a mathematics degree, you will often get a fair bit of choice of what subjects to study in depth. So any list of subjects-you-need-to-know will need to be somewhat variable -- or else end up covering more than most people would do in their actual undergraduate studies. Could you say more about what areas you're interested in?

Theoretically, I'm most interested in things related to Data Science/Machine Learning/Deep Learning, as that's my day job. However, since this is my day job, it's also the area that I know the most about. So e.g. I've studied Linear Algebra, Probability, Statistics etc quiet a bit.

I'm mostly interested in rounding out my specific knowledge with everything else I need to know to have knowledge equivalent to a well-rounded math major.

In terms of what personally interests me the most, that'd be logic/foundations-of-mathematics type stuff, e.g. Set Theory.

Bingo. As a computer scientist, I have an incredible urge to figure out

howfunctions are built. Of course, I knew on some level that functions aren't "built" the same way in math as in CS... When I reached the Axiom of Choice and it proclaimed, "Let there be arbitrary choice functions" - that was myOopsmoment. Functions either exist, or they don't. Similarly, you don't "find" or "build" sets, they simply can and do exist, or they can't and don't (under the specified axioms, at least). You may need to show you can construct a set for a proof, but that's a little different.Obvious in retrospect, and a good data point for improving my ability to notice confusion.

I think if you read more textbooks you'll naturally get used to the correct level of rigour.

I really appreciate people writing down the things they were confused about, crisply. Thanks.

Re: proof calibration; there are a couple textbooks on proofwriting. I personally used Velleman's

How to Prove It,but another option is Hammack'sBook of Proof, which I haven't read but appears to cover the same material at approximately equal length. For comparison, Halmos introduces first-order logic on pages 6 and 7 ofNaive Set Theory, whereas Velleman spends about 60 pages on the same material.It doesn't fit my model of how mathematics works technically or socially that you can really get very confident but wrong about your math knowledge without a lot of self-deception. Exercises provide instant feedback. And according to Terence Tao's model, students don't spend most of their education learning whether or not a proof is valid

at all, so much as learning how to evaluate longer proofs more quickly without as much conscious thought.Part of that process is understanding formal things, part of it is understanding how mathematicians' specialized natural language are shorthand for formal things. E.g. my friend was confused when he read an exercise telling him to prove that a set was "the smallest set" with this property (and perhaps obviously the author didn't unpack this). What this means formally when expanded is "Prove that this set is a subset of every set with this property." AFAICT, there's no way to figure out what this means formally without someone telling you, or (this is unlikely) inventing the formal version yourself because you need it and realizing that 'smallest set' is good shorthand and this is probably what was meant. Textbooks are good for fixing this because the authors know that textbooks are where most students will learn how to talk like a mathematician without spelling everything out. I find ProofWiki very useful for having everything spelled out the way I would like it and consistently when I don't know what the author is trying to say.

Finally, I have a rationalist/adjacent friend who tutored me enough to get to the point where I could verify my own proofs; I haven't talked to them in a while, but I could try to get in touch and see if they would be interested in checking your proofs. Last time I talked to them, they expressed that the main bottleneck on the number of students they had was students' willingness to study.

Slight nitpick: it means "prove that this set is a subset of every set with this property,

andhas this property."This sort of thing is terrible; I learned most of it from the internet (MathOverflow, Wikipedia, the nLab, blogs), for what it's worth.

Thanks, that’s very helpful! I appreciate the offer; let me see how I feel after the next book.

Um. I→∏i∈IXi, with I being the set of two elements, will have two pairs of objects in its range, for a total of four.

Instead of "The range is a collection of objects, one from each