The value of learning mathematical proof

Personally I think real analysis is an awkward way to learn mathematical proofs, and I agree discrete mathematics or elementary number theory is much better. I recommend picking up an Olympiad book for younger kids, like "Mathematical Circles, A Russian Experience."

Is Scott Alexander bad at math?

I'm sure not only "elite" mathematicians intuit the interest of problems like the unsolvability of the quintic. That one can prove a construction impossible, the very concept of an invariant, is startling to the uninitiated. So many classic problems of this nature are held up as paradigms of beauty--the Konigsberg bridge problem, ruler and compass constructions of cube roots, the irrationality of sqrt(2),..

The Truth About Mathematical Ability

I'm doing my math PhD at Harvard in the same area as Qiaochu. I was also heavily involved in artofproblemsolving and went to MathPath in 2003. I hoped since 2003 that I could stake a manifest destiny in mathematics research.

Qiaochu and I performed similarly in Olympiad competitions, had similar performances in the same undergraduate program, and were both attracted to this website. However, I get the sense that he is driven quite a bit by geometry, or is at least not actively adverse to it. Despite being a homotopy theorist, I find geometry awkward and ... (read more)

17y

You should note that even great mathematicians sometimes question their
abilities.Michael Atiyah:
[http://press.princeton.edu/chapters/gowers/gowers_VIII_6.pdf]

Lately I've been feeling particularly incompetent mathematically, to the point that I question how much of a future I have in the subject.

Impostor syndrome is really, really common in mathematics. Trust me; if there's a place in mathematics for me, then there's a place for you too.

Lately I've been feeling particularly incompetent mathematically, to the point that I question how much of a future I have in the subject. Therefore I quite often wonder what mathematical ability is all about, and I look forward to hearing if your perspective gels with my own.

More later, but just a brief remark – I think that one issue is that the top ~200 mathematicians are of such high intellectual caliber that they've plucked all of the low hanging fruit and that as a result mathematicians outside of that group have a really hard time doing research ... (read more)

Why is the A-Theory of Time Attractive?

I apologize for the snipy remark, which was a product of my general frustrations with life at the moment.

I was trying to strongly stress the difference between (1) an abstract R-torsor (B-theory), and (2) R viewed as an R-torsor (your patch on A-theory).

Any R-torsor is isomorphic to R viewed as an R-torsor, but that isomorphism is not unique. My understanding is that physicists view such distinctions as useless pendantry, but mathematicians are for better or worse trained to respect them. I do not view an abstract R-torsor as having a basis that can be changed.

08y

Indeed it wouldn't. A function space defined on an R-torsor would have a basis
which you could change.

Why is the A-Theory of Time Attractive?

You might be able to do it with some abstract nonsense. I think general machinery will prove that in categories such as that defined in the top answer of

there are terminal objects. I don't have time to really think it through though.

Why is the A-Theory of Time Attractive?

I have always heard the affine line defined as an R-torsor, and never seen an alternative characterization. I don't know the alternative axiomatization you are referring to. I would be interested to hear it and see if it does not secretly rely on a very similar and simpler axiomatization of (R,+) itself.

I do know how to characterize the affine line as a topological space without reference to the real numbers.

Torsors seem interesting from the point of view of Occam's razor because they have less structure but take more words to define.

18y

This is what I was referring to. The axioms of ordered geometry
[http://en.wikipedia.org/wiki/Ordered_geometry#Axioms_of_ordered_geometry],
especially Dedekind's axiom, give you the topology of the affine line without a
distinguished 0, without distinguishing a direction as "positive", and without
the additive structure.
However, in all the ways I know of to construct a structure satisfying these
axioms, you first have to construct the rationals as an ordered field, and the
result of course is just the reals, so I don't know of a constructive way to get
at the affine line without constructing the reals with all of their additional
field structure.

Why is the A-Theory of Time Attractive?

I think that the distinction may be clarified by the mathematical notion of an affine line. I sense that you do not know much modern mathematics, but let me try to clarify the difference between affine and linear space.

The A-theorists are thinking in terms of a linear space, that is an oriented vector space. To them time is splayed out on a real number line, which has an origin (the present) and an orientation (a preferred future direction).

The B-theorists are thinking in terms of an affine line. An affine line is somewhat like the A-theoriests real lin... (read more)

18y

... from what do you get this impression, and in what way is it relevant? Yes,
there are many parts of modern mathematics I am not familiar with. However,
nothing that had come up up to this point was defined in the last 100 years, let
alone the last 50.
I have a PhD in physics. I know what an affine space is. If you were thrown off
by my uses of basis changes to effect translations, which would signal ignorance
since vector addition is not equivalent to change of basis... I did clarify that
I was in a function space defined over time, and in the case of function spaces
defined over vector fields, translations of the argument of the function are
indeed changes of basis.
In physics, we set the origin to be whatever. All the time. This is because we
need to do actual arithmetic with actual numbers, and number systems with no 0
are needlessly cumbersome to use. Moving 0 around all the time in
context-dependent and arbitrary ways completely defuses the 'harm' of A-theory,
as far as I can tell.

38y

I think that this analogy is accurate and reveals that A-theorists are
attributing additional structure to time, and therefore that they take a hit
from Occam's razor.
However, to be fair, I think that an A-theorist would dispute your analogy. They
would deny that time "is" splayed out on a number line, because there is no
standpoint from which all of time is anything. Parts of time were one way, and
other parts of time will be other ways, but the only part of time that is
anything is the present moment.
(I'm again using A-theorist as code from presentist.)
By the way, off-topic, but:
This is true if affine space is defined as a torsor for the reals as an additive
group, but you can also axiomatize the affine line without reference to the
reals. It's not clear to me whether this means that you can construct the affine
line in some reasonable sense without reference to the reals. Do you know?

Mathematics as a lossy compression algorithm gone wild

I guess I always took the phrase "unreasonable effectiveness" to refer to the "coincidence" you mention in your reply. I'm not really sure you've gone far toward explaining this coincidence in your article. Just what is it that you think mathematicians have "pure curiousity" about? What does it mean to "perfect a tool for its own sake" and why do those perfections sometimes wind up having practical further use. As a pure mathematician, I never think about applying a tool to the real world, but I do think I'm working towards a very compressed understanding of tool making.

28y

Unreasonable effectiveness tends to refer to the observation that the same
mathematical tools, like, say, mathematical analysis, end up useful for modeling
very disparate phenomena. In a more basic form it is "why do mathematical ideas
help us understand the world so well?". The answer suggested in the OP is that
the question is a tautology: math is a meta-model build by human minds, not a
collection of some abstract objects which humans discover in their pursuit of
better models of the world. The JPEG analogy is that asking why math is
unreasonably effective in constructing disparate (lossy) models is like asking
why the JPEG algorithm is unreasonably effective in lossy compression of
disparate images.
The "coincidence" part referred to something else: that pursuing math research
for its own sake may occasionally work out ot be useful for modeling the
physical world, number theory and encryption being the standard example.
When I talk to someone who works in pure math, they usually describe the
motivation for what they do in almost artistic terms, not caring whether what
they do can be useful for anything, so "tool making" does not seem like the
right term.

Mathematics as a lossy compression algorithm gone wild

So what does "gone wild" mean? Your paragraph about this is not very charitable to the pure mathematician.

Say that mathematics is about generating compressed models of the world. How do we generate these models? Surely we will want to study (compress) our most powerful compression heurestics. Is that not what pure math is?

28y

I agree I was too brief there. The original motivation for math was to help
figure out the physical world. At some point (multiple points, really, starting
with Euclid), perfecting the tools for their own sake became just as much of a
motivation. This is not a judgement but an observation. Yes, sometimes "pure
math" yields unexpected benefits, but this is more of a coincidence then the
reason people do it (despite what the grant applications might say).
The main reason is pure curiosity without any care for eventual applicability to
understanding the physical world. Pretending otherwise would be disingenuous.
Theoretical physics is not very different in that regard. To quote Feynman

Rationality Quotes April 2014

Computer scientists seem much more ready to adopt the language of homotopy type theory than homotopy theorists at the moment. It should be noted that there are many competing new languages for expressing the insights garnered by infinity groupoids. Though Voevodsky's language is the only one that has any connection to computers, the competing language of quasi-categories is more popular.

Rationality Quotes April 2014

It is misleading to attribute that book solely to Voevodsky.

0[anonymous]8y

Yes. But it's forgiveably misleading to attribute it non-exclusively to him, in
a thread of comments which was started about him.

Results from MIRI's December workshop

Imagine that we encounter a truly iid random sequence of 90% likely propositions Q(0),Q(1),Q(2),... Perhaps they are merely pseudorandom but impossibly complicated to reason about, or perhaps they represent some random external output that an agent observes. After observing a very large number of these Q(i), one might expect to place high probability on something like "About 90% of the next 10^100 Q(j) I haven't observed yet will be true," but there is unlikely to be any simple rule that describes the already observed Q(i). Do you think that the next 10^100 Q(j) will all individually be believed 90% likely to be true, or will the simpler to describe Q(j) receive closer to 50% probability?

08y

We can show that the FOL prior is not too different from the algorithmic prior,
so it can't perform too badly for problems where algorithmic induction does
well. Partial theories which imply probabilities close to .9 but do not specify
exact predictions will eventually have high probability; for example, a theory
might specify that Q(x) is derived from an unspecified F(x) and G(x) (treated as
random sources) getting OR'd together, making probabilities roughly .75;
variations of this would bring things closer to .9.
This still may still assign simpler Q(j) to closer to 50% probability.

Results from MIRI's December workshop

The key here is that you are using finite S. What do you do if S is infinite? More concretely, is your schema convergent if you grow your finite S by adding more and more statements? I believe we touch on such worries in the writeup.

Results from MIRI's December workshop

Sorry if there is something fishy in the writeup :(. I could believe it, given how rushed I was writing it.

Suppose we consider not just a,~a,b,~b, and c,~c, but also statements q="exactly one of a,b,c is true" and ~q. Suppose now that we uniformly pick a truth value for a, then for q, then a logically consistent but otherwise random value for b, and finally a logically consistent but otherwise random value for c. Such an asymmetric situation could occur if b and c have high mu but a and q have small mu. In the worlds where we believe q, b and c... (read more)

08y

This was surprisingly clarifying for me.
I'm still not sure whether I find it too concerning. We assign higher
probability to simpler theories so long as they follow the constraint that Q(n)
below 10^100 is 90%. The theory which randomly assigns Q(n) to true/false for
10^99 simple values of n (and then gets forced to assign Q as true for most of
the remaining n below 10^100) is just one form of theory which may be generated
by the process, and not a really simple one. The theory that Q(n) represents
"not divisible by 10" is going to be much more probable than all these theories.
In other words, the write-up estimates Q(n) based on a narrow subset of the
theories which assign probability mass. I don't really think it's
representative...

Results from MIRI's December workshop

Hi Manfred, the link is in the theme 1and section of the original post. Let me know if it clarifies our worries about simplicity priors.

38y

It looks like what I was talking about and what you were talking about were
different things. I was talking about having a complexity-based prior over the
behavior of our predicate P(x) on the basis of actually knowing about the
complexity of P.
The bad behavior you wrote up, though, seems like a consequence of using a
shoddy method for updating on outside information.
If we just want to throw computing power at the problem until it goes away, the
solution is actually really simple: you pick a model at random like normal, but
if it's not consistent with the information we're conditioning on, you throw it
out. Note that this requires even probabilistic-seeming information to be
phrased as frequencies, like "P(x) is true for 90% of some domain." But since
this whole method of assigning probabilities is built off of frequencies, you're
probably fine with that.
Can I ask a non-sequitur question? If I were to write up some posts about
logical probability, largely reprising the analogous trains of thought in the
first few chapters of Jaynes, what would be a good comprehensive strategy to get
the people attending workshops like this to read them?

Results from MIRI's December workshop

Hi Manfred, It might help to communicate if you read the report, though I'm not sure it's understandable since it was written quite rapidly :(. Thanks for sending your stream of consciousness and I look forward to hearing more of it. You seem quite convinced that a simplicity prior exhibits reasonable behavior, but I am less so. A lot of the workshop involved understanding potential pitfalls of simplicity priors. Basically, some pathologies appear to arise from the fact that a very large pseudorandom number with a short description is treated very diff... (read more)

18y

I confess I'm ignorant about what report you mean - could you throw me a link?
Hm, x being simple or complicated...
For example, if our predicate P(x) is true just for prime numbers, then "the
trillionth prime number" is a simple description - no prime number can have a
greater complexity than its description qua prime number, and so if we assign
high probability to simple descriptions like primeness, we'll expect P to be
true for simple numbers.
I would argue that this is proper behavior. It is not that our robot cares about
the complexity of the number. It just tries to find simple patterns in the data,
and naturally some numbers will fit into more simple patterns than others, once
we're given information about the density of P(x) being true.
An extreme example would be if P(x) were only true (or, equivalently, false) for
a single number. Then our probability about P(x) really would be about
complexity of the number. Why is this proper behavior? Because information that
justifies pattern-matching behavior is information about the complexity of P(x)!
If P(x) is a single number and we know that its description is less than 100
bits, that knowledge is what allows us to exhibit a preference for simpler
patterns, and thus simpler numbers.
Fun fact: if we know that P(x) is true for 50% of numbers, the complexity of x
has essentially no impact, even though we still favor simple patterns.

Results from MIRI's December workshop

Hi Manfred, your post indeed touches on many of the same issues we were considering on the workshop. If I understand it correctly, you are worried about the robot correctly learning universal generalizations. For example, if P(x) is true for all odd x and false for all even x, you want your robot to correctly guess that after observing P(0),P(1),P(2),...,P(1 million). In fact, most current proposals, including Demski's, are capable of learning these universal generalizations. Our current programs are able to correctly learn rules like "P(x) is tru... (read more)

38y

Well, I wrote down some stream of consciousness stuff :P
The statement "P(x) is true for 90% of x in some domain" is an interesting one.
A message-length or complexity based prior over patterns (like I justified the
use of in my post, and as is commonly used in things like minimum message length
prediction) does not have statements like that as basic statements. Instead the
basic statements are about the different patterns that could correspond to the
values of P(x) over its domain.
If you've done random sampling on domain x and gotten P(x) true 90% of the time,
then a robot using a complexity-based prior refuses to throw away the
information about what you sampled, and just tries to find simple patterns in
your data. This does lead it to predict that some additional statement has high
probability of being true, because simple patterns tend to be consistent. But it
does not assign equal probability to the truth of each P(x), because different
x's can fit into different patterns.
So a complexity prior over functions exhibits the correct behavior, at least
when you give it the raw data. But what if we give if the information "P(x) is
true for 90% of x in some domain?"
Well, there are lots of different patterns with P(x) true for 90% of x. But the
math is fairly straightfoward - our robot takes its prior, cuts out every
pattern that doesn't have P(x) true for 90% of x, and then renormalizes to get
its new distribution over patterns. Again, not every x has the same probability,
because simple patterns contribute more, but the probability is around 90% for
various x.
However, at no point does our complexity-prior robot ever think of the statement
"P(x) is true for 90% of x in some domain" on its own. To do that, we'd have to
make a robot that approximated by throwing out the information of what the
sampled P(x) were, and only keeping the frequencies, and then didn't consider
any information about the complexity of the function (equivalent to a uniform
non-standard p

Progress on automated mathematical theorem proving?

|A kind of counter-example to your claim is the following: http://www.math.rutgers.edu/~zeilberg/GT.html It is an automated reasoning system for Euclidean geometry. Starting from literally nothing, it derived all of the geometric propositions in Euclid's Elements in a matter of seconds. Then it proceeded to produce a number of geometric theorems of human interest that were never noticed by any previous Euclidean geometers, classical or modern.

This is simply to point out that there are some fields of math that are classically very hard but computers find ... (read more)

09y

Where do you get these detailed claims about the program? I don't see anything
like them in the book or here
[http://www.math.rutgers.edu/~zeilberg/Opinion43.html] or here
[http://www.math.rutgers.edu/~zeilberg/Opinion44.html]

59y

This is interesting.
It's hard for me to assess it from the outside. In particular, I don't have a
good sense for the number of sequences of logical derivations one has to
consider in order to arrive at the theorems that were proved if one were to
proceed by brute force. I find it more interesting that it honed in on classical
theorems on its own than that it was able to prove them (one can use coordinate
geometry to reduce proofs to solving polynomial equations).
I think that it's significant that Euclidean geometry fell out of fashion a long
time ago: the fraction of modern mathematicians who think about Euclidean
geometry is negligible, and this may reflect an accurate assessment of the field
as mathematically shallow. I didn't appreciate geometry until I learned about
discrete groups of isometries acting on homogeneous Riemannian manifolds.
Thanks for the reference
How much future progress? :-)

Reflection in Probabilistic Logic

I agree that the derivation of (4) from (3) in the paper is unclear. The negation of a=b>=c.

19y

Ah, so there are already revisions... (I didn't have a (4) in the version I
read).

Second-Order Logic: The Controversy

I do not think the situation is as simple as you claim it to be. Consider that a category is more general than a monoid, but there are many interesting theorems about categories.

As far as foundations for mathematical logic go, any one interested in such things should be made aware of the recent invention of univalent type theory. This can be seen as a logic which is inherently higher-order, but it also has many other desirable properties. See for instance this recent blog post: http://golem.ph.utexas.edu/category/2013/01/from_set_theory_to_type_theory.h... (read more)

09y

Sure. It has of course been the case that careful increases in generality have
also led to increases in power (hence the weasel word "usually"). There is
something like a "production-possibilities frontier" relating generality and
power, and some concepts are near the frontier (and so one can't generalize them
without losing some power) while some are not.

The Value of Theoretical Research

There are many types of math, with differing sorts of value, but I can say a little about the sort of math I find moving.

I agree with you. For the most part, applied souls dream up their advances and make them without relying on the mathematical machine. They invent the math they need to describe their ideas. Or perhaps they use a little of the pure mathematician's machine, but quickly develop it in ways that are more important to their work than the previous mathematical meanderings.

I think you underestimate the role of mathematics as the grand exposit... (read more)

As a current Harvard math grad student I think you should read many different easy books to learn a subject whenever possible, especially if you can find them for free. When you say you are mathematically able it is unclear what level you are at. All of my favorite books for learning involve huge number of exercises, and I recommend you do all of them instead of reading ahead.

For basic real analysis, my favorite book is Rosenlicht's Introduction to Analysis but baby Rudin is pretty good too, and I recommend you flip back and forth between them both.

For l... (read more)