All of JeremyHahn's Comments + Replies

The value of learning mathematical proof

As a current Harvard math grad student I think you should read many different easy books to learn a subject whenever possible, especially if you can find them for free. When you say you are mathematically able it is unclear what level you are at. All of my favorite books for learning involve huge number of exercises, and I recommend you do all of them instead of reading ahead.

For basic real analysis, my favorite book is Rosenlicht's Introduction to Analysis but baby Rudin is pretty good too, and I recommend you flip back and forth between them both.

For l... (read more)

0Ben Pace7y
Thanks; I have pm-ed you for a follow-up.
The value of learning mathematical proof

Personally I think real analysis is an awkward way to learn mathematical proofs, and I agree discrete mathematics or elementary number theory is much better. I recommend picking up an Olympiad book for younger kids, like "Mathematical Circles, A Russian Experience."

Is Scott Alexander bad at math?

I'm sure not only "elite" mathematicians intuit the interest of problems like the unsolvability of the quintic. That one can prove a construction impossible, the very concept of an invariant, is startling to the uninitiated. So many classic problems of this nature are held up as paradigms of beauty--the Konigsberg bridge problem, ruler and compass constructions of cube roots, the irrationality of sqrt(2),..

The Truth About Mathematical Ability

I'm doing my math PhD at Harvard in the same area as Qiaochu. I was also heavily involved in artofproblemsolving and went to MathPath in 2003. I hoped since 2003 that I could stake a manifest destiny in mathematics research.

Qiaochu and I performed similarly in Olympiad competitions, had similar performances in the same undergraduate program, and were both attracted to this website. However, I get the sense that he is driven quite a bit by geometry, or is at least not actively adverse to it. Despite being a homotopy theorist, I find geometry awkward and ... (read more)

You should note that even great mathematicians sometimes question their abilities.Michael Atiyah: []

Lately I've been feeling particularly incompetent mathematically, to the point that I question how much of a future I have in the subject.

Impostor syndrome is really, really common in mathematics. Trust me; if there's a place in mathematics for me, then there's a place for you too.

Lately I've been feeling particularly incompetent mathematically, to the point that I question how much of a future I have in the subject. Therefore I quite often wonder what mathematical ability is all about, and I look forward to hearing if your perspective gels with my own.

More later, but just a brief remark – I think that one issue is that the top ~200 mathematicians are of such high intellectual caliber that they've plucked all of the low hanging fruit and that as a result mathematicians outside of that group have a really hard time doing research ... (read more)

Why is the A-Theory of Time Attractive?

I apologize for the snipy remark, which was a product of my general frustrations with life at the moment.

I was trying to strongly stress the difference between (1) an abstract R-torsor (B-theory), and (2) R viewed as an R-torsor (your patch on A-theory).

Any R-torsor is isomorphic to R viewed as an R-torsor, but that isomorphism is not unique. My understanding is that physicists view such distinctions as useless pendantry, but mathematicians are for better or worse trained to respect them. I do not view an abstract R-torsor as having a basis that can be changed.

Indeed it wouldn't. A function space defined on an R-torsor would have a basis which you could change.
Why is the A-Theory of Time Attractive?

You might be able to do it with some abstract nonsense. I think general machinery will prove that in categories such as that defined in the top answer of

there are terminal objects. I don't have time to really think it through though.

Why is the A-Theory of Time Attractive?

I have always heard the affine line defined as an R-torsor, and never seen an alternative characterization. I don't know the alternative axiomatization you are referring to. I would be interested to hear it and see if it does not secretly rely on a very similar and simpler axiomatization of (R,+) itself.

I do know how to characterize the affine line as a topological space without reference to the real numbers.

Torsors seem interesting from the point of view of Occam's razor because they have less structure but take more words to define.

This is what I was referring to. The axioms of ordered geometry [], especially Dedekind's axiom, give you the topology of the affine line without a distinguished 0, without distinguishing a direction as "positive", and without the additive structure. However, in all the ways I know of to construct a structure satisfying these axioms, you first have to construct the rationals as an ordered field, and the result of course is just the reals, so I don't know of a constructive way to get at the affine line without constructing the reals with all of their additional field structure.
Why is the A-Theory of Time Attractive?

I think that the distinction may be clarified by the mathematical notion of an affine line. I sense that you do not know much modern mathematics, but let me try to clarify the difference between affine and linear space.

The A-theorists are thinking in terms of a linear space, that is an oriented vector space. To them time is splayed out on a real number line, which has an origin (the present) and an orientation (a preferred future direction).

The B-theorists are thinking in terms of an affine line. An affine line is somewhat like the A-theoriests real lin... (read more)

... from what do you get this impression, and in what way is it relevant? Yes, there are many parts of modern mathematics I am not familiar with. However, nothing that had come up up to this point was defined in the last 100 years, let alone the last 50. I have a PhD in physics. I know what an affine space is. If you were thrown off by my uses of basis changes to effect translations, which would signal ignorance since vector addition is not equivalent to change of basis... I did clarify that I was in a function space defined over time, and in the case of function spaces defined over vector fields, translations of the argument of the function are indeed changes of basis. In physics, we set the origin to be whatever. All the time. This is because we need to do actual arithmetic with actual numbers, and number systems with no 0 are needlessly cumbersome to use. Moving 0 around all the time in context-dependent and arbitrary ways completely defuses the 'harm' of A-theory, as far as I can tell.
I think that this analogy is accurate and reveals that A-theorists are attributing additional structure to time, and therefore that they take a hit from Occam's razor. However, to be fair, I think that an A-theorist would dispute your analogy. They would deny that time "is" splayed out on a number line, because there is no standpoint from which all of time is anything. Parts of time were one way, and other parts of time will be other ways, but the only part of time that is anything is the present moment. (I'm again using A-theorist as code from presentist.) By the way, off-topic, but: This is true if affine space is defined as a torsor for the reals as an additive group, but you can also axiomatize the affine line without reference to the reals. It's not clear to me whether this means that you can construct the affine line in some reasonable sense without reference to the reals. Do you know?
Mathematics as a lossy compression algorithm gone wild

I guess I always took the phrase "unreasonable effectiveness" to refer to the "coincidence" you mention in your reply. I'm not really sure you've gone far toward explaining this coincidence in your article. Just what is it that you think mathematicians have "pure curiousity" about? What does it mean to "perfect a tool for its own sake" and why do those perfections sometimes wind up having practical further use. As a pure mathematician, I never think about applying a tool to the real world, but I do think I'm working towards a very compressed understanding of tool making.

Unreasonable effectiveness tends to refer to the observation that the same mathematical tools, like, say, mathematical analysis, end up useful for modeling very disparate phenomena. In a more basic form it is "why do mathematical ideas help us understand the world so well?". The answer suggested in the OP is that the question is a tautology: math is a meta-model build by human minds, not a collection of some abstract objects which humans discover in their pursuit of better models of the world. The JPEG analogy is that asking why math is unreasonably effective in constructing disparate (lossy) models is like asking why the JPEG algorithm is unreasonably effective in lossy compression of disparate images. The "coincidence" part referred to something else: that pursuing math research for its own sake may occasionally work out ot be useful for modeling the physical world, number theory and encryption being the standard example. When I talk to someone who works in pure math, they usually describe the motivation for what they do in almost artistic terms, not caring whether what they do can be useful for anything, so "tool making" does not seem like the right term.
Mathematics as a lossy compression algorithm gone wild

So what does "gone wild" mean? Your paragraph about this is not very charitable to the pure mathematician.

Say that mathematics is about generating compressed models of the world. How do we generate these models? Surely we will want to study (compress) our most powerful compression heurestics. Is that not what pure math is?

I agree I was too brief there. The original motivation for math was to help figure out the physical world. At some point (multiple points, really, starting with Euclid), perfecting the tools for their own sake became just as much of a motivation. This is not a judgement but an observation. Yes, sometimes "pure math" yields unexpected benefits, but this is more of a coincidence then the reason people do it (despite what the grant applications might say). The main reason is pure curiosity without any care for eventual applicability to understanding the physical world. Pretending otherwise would be disingenuous. Theoretical physics is not very different in that regard. To quote Feynman
Rationality Quotes April 2014

Computer scientists seem much more ready to adopt the language of homotopy type theory than homotopy theorists at the moment. It should be noted that there are many competing new languages for expressing the insights garnered by infinity groupoids. Though Voevodsky's language is the only one that has any connection to computers, the competing language of quasi-categories is more popular.

Rationality Quotes April 2014

It is misleading to attribute that book solely to Voevodsky.

Yes. But it's forgiveably misleading to attribute it non-exclusively to him, in a thread of comments which was started about him.
Results from MIRI's December workshop

Imagine that we encounter a truly iid random sequence of 90% likely propositions Q(0),Q(1),Q(2),... Perhaps they are merely pseudorandom but impossibly complicated to reason about, or perhaps they represent some random external output that an agent observes. After observing a very large number of these Q(i), one might expect to place high probability on something like "About 90% of the next 10^100 Q(j) I haven't observed yet will be true," but there is unlikely to be any simple rule that describes the already observed Q(i). Do you think that the next 10^100 Q(j) will all individually be believed 90% likely to be true, or will the simpler to describe Q(j) receive closer to 50% probability?

We can show that the FOL prior is not too different from the algorithmic prior, so it can't perform too badly for problems where algorithmic induction does well. Partial theories which imply probabilities close to .9 but do not specify exact predictions will eventually have high probability; for example, a theory might specify that Q(x) is derived from an unspecified F(x) and G(x) (treated as random sources) getting OR'd together, making probabilities roughly .75; variations of this would bring things closer to .9. This still may still assign simpler Q(j) to closer to 50% probability.
Results from MIRI's December workshop

The key here is that you are using finite S. What do you do if S is infinite? More concretely, is your schema convergent if you grow your finite S by adding more and more statements? I believe we touch on such worries in the writeup.

Results from MIRI's December workshop

Sorry if there is something fishy in the writeup :(. I could believe it, given how rushed I was writing it.

Suppose we consider not just a,~a,b,~b, and c,~c, but also statements q="exactly one of a,b,c is true" and ~q. Suppose now that we uniformly pick a truth value for a, then for q, then a logically consistent but otherwise random value for b, and finally a logically consistent but otherwise random value for c. Such an asymmetric situation could occur if b and c have high mu but a and q have small mu. In the worlds where we believe q, b and c... (read more)

This was surprisingly clarifying for me. I'm still not sure whether I find it too concerning. We assign higher probability to simpler theories so long as they follow the constraint that Q(n) below 10^100 is 90%. The theory which randomly assigns Q(n) to true/false for 10^99 simple values of n (and then gets forced to assign Q as true for most of the remaining n below 10^100) is just one form of theory which may be generated by the process, and not a really simple one. The theory that Q(n) represents "not divisible by 10" is going to be much more probable than all these theories. In other words, the write-up estimates Q(n) based on a narrow subset of the theories which assign probability mass. I don't really think it's representative...
Results from MIRI's December workshop

Hi Manfred, the link is in the theme 1and section of the original post. Let me know if it clarifies our worries about simplicity priors.

It looks like what I was talking about and what you were talking about were different things. I was talking about having a complexity-based prior over the behavior of our predicate P(x) on the basis of actually knowing about the complexity of P. The bad behavior you wrote up, though, seems like a consequence of using a shoddy method for updating on outside information. If we just want to throw computing power at the problem until it goes away, the solution is actually really simple: you pick a model at random like normal, but if it's not consistent with the information we're conditioning on, you throw it out. Note that this requires even probabilistic-seeming information to be phrased as frequencies, like "P(x) is true for 90% of some domain." But since this whole method of assigning probabilities is built off of frequencies, you're probably fine with that. Can I ask a non-sequitur question? If I were to write up some posts about logical probability, largely reprising the analogous trains of thought in the first few chapters of Jaynes, what would be a good comprehensive strategy to get the people attending workshops like this to read them?
Results from MIRI's December workshop

Hi Manfred, It might help to communicate if you read the report, though I'm not sure it's understandable since it was written quite rapidly :(. Thanks for sending your stream of consciousness and I look forward to hearing more of it. You seem quite convinced that a simplicity prior exhibits reasonable behavior, but I am less so. A lot of the workshop involved understanding potential pitfalls of simplicity priors. Basically, some pathologies appear to arise from the fact that a very large pseudorandom number with a short description is treated very diff... (read more)

I confess I'm ignorant about what report you mean - could you throw me a link? Hm, x being simple or complicated... For example, if our predicate P(x) is true just for prime numbers, then "the trillionth prime number" is a simple description - no prime number can have a greater complexity than its description qua prime number, and so if we assign high probability to simple descriptions like primeness, we'll expect P to be true for simple numbers. I would argue that this is proper behavior. It is not that our robot cares about the complexity of the number. It just tries to find simple patterns in the data, and naturally some numbers will fit into more simple patterns than others, once we're given information about the density of P(x) being true. An extreme example would be if P(x) were only true (or, equivalently, false) for a single number. Then our probability about P(x) really would be about complexity of the number. Why is this proper behavior? Because information that justifies pattern-matching behavior is information about the complexity of P(x)! If P(x) is a single number and we know that its description is less than 100 bits, that knowledge is what allows us to exhibit a preference for simpler patterns, and thus simpler numbers. Fun fact: if we know that P(x) is true for 50% of numbers, the complexity of x has essentially no impact, even though we still favor simple patterns.
Results from MIRI's December workshop

Hi Manfred, your post indeed touches on many of the same issues we were considering on the workshop. If I understand it correctly, you are worried about the robot correctly learning universal generalizations. For example, if P(x) is true for all odd x and false for all even x, you want your robot to correctly guess that after observing P(0),P(1),P(2),...,P(1 million). In fact, most current proposals, including Demski's, are capable of learning these universal generalizations. Our current programs are able to correctly learn rules like "P(x) is tru... (read more)

Well, I wrote down some stream of consciousness stuff :P The statement "P(x) is true for 90% of x in some domain" is an interesting one. A message-length or complexity based prior over patterns (like I justified the use of in my post, and as is commonly used in things like minimum message length prediction) does not have statements like that as basic statements. Instead the basic statements are about the different patterns that could correspond to the values of P(x) over its domain. If you've done random sampling on domain x and gotten P(x) true 90% of the time, then a robot using a complexity-based prior refuses to throw away the information about what you sampled, and just tries to find simple patterns in your data. This does lead it to predict that some additional statement has high probability of being true, because simple patterns tend to be consistent. But it does not assign equal probability to the truth of each P(x), because different x's can fit into different patterns. So a complexity prior over functions exhibits the correct behavior, at least when you give it the raw data. But what if we give if the information "P(x) is true for 90% of x in some domain?" Well, there are lots of different patterns with P(x) true for 90% of x. But the math is fairly straightfoward - our robot takes its prior, cuts out every pattern that doesn't have P(x) true for 90% of x, and then renormalizes to get its new distribution over patterns. Again, not every x has the same probability, because simple patterns contribute more, but the probability is around 90% for various x. However, at no point does our complexity-prior robot ever think of the statement "P(x) is true for 90% of x in some domain" on its own. To do that, we'd have to make a robot that approximated by throwing out the information of what the sampled P(x) were, and only keeping the frequencies, and then didn't consider any information about the complexity of the function (equivalent to a uniform non-standard p
Progress on automated mathematical theorem proving?

|A kind of counter-example to your claim is the following: It is an automated reasoning system for Euclidean geometry. Starting from literally nothing, it derived all of the geometric propositions in Euclid's Elements in a matter of seconds. Then it proceeded to produce a number of geometric theorems of human interest that were never noticed by any previous Euclidean geometers, classical or modern.

This is simply to point out that there are some fields of math that are classically very hard but computers find ... (read more)

Where do you get these detailed claims about the program? I don't see anything like them in the book or here [] or here []
This is interesting. It's hard for me to assess it from the outside. In particular, I don't have a good sense for the number of sequences of logical derivations one has to consider in order to arrive at the theorems that were proved if one were to proceed by brute force. I find it more interesting that it honed in on classical theorems on its own than that it was able to prove them (one can use coordinate geometry to reduce proofs to solving polynomial equations). I think that it's significant that Euclidean geometry fell out of fashion a long time ago: the fraction of modern mathematicians who think about Euclidean geometry is negligible, and this may reflect an accurate assessment of the field as mathematically shallow. I didn't appreciate geometry until I learned about discrete groups of isometries acting on homogeneous Riemannian manifolds. Thanks for the reference How much future progress? :-)
Reflection in Probabilistic Logic

I agree that the derivation of (4) from (3) in the paper is unclear. The negation of a=b>=c.

Ah, so there are already revisions... (I didn't have a (4) in the version I read).
Second-Order Logic: The Controversy

I do not think the situation is as simple as you claim it to be. Consider that a category is more general than a monoid, but there are many interesting theorems about categories.

As far as foundations for mathematical logic go, any one interested in such things should be made aware of the recent invention of univalent type theory. This can be seen as a logic which is inherently higher-order, but it also has many other desirable properties. See for instance this recent blog post: (read more)

Sure. It has of course been the case that careful increases in generality have also led to increases in power (hence the weasel word "usually"). There is something like a "production-possibilities frontier" relating generality and power, and some concepts are near the frontier (and so one can't generalize them without losing some power) while some are not.
The Value of Theoretical Research

There are many types of math, with differing sorts of value, but I can say a little about the sort of math I find moving.

I agree with you. For the most part, applied souls dream up their advances and make them without relying on the mathematical machine. They invent the math they need to describe their ideas. Or perhaps they use a little of the pure mathematician's machine, but quickly develop it in ways that are more important to their work than the previous mathematical meanderings.

I think you underestimate the role of mathematics as the grand exposit... (read more)