It's 558 bits long in its current form.
bytes*. well, unicode characters, really. and similar quibbles with the rest of the paragraph.
I object to the title, but like the post. I am quite confident that chess neural nets have learned some mathematical structure of how to play chess well, along with structure of chess. Yet the arguments of the post mostly apply just as well to chess. Does optimal-chess playing under computational constraints have "non-mathematical features" when you could precisely formulate what "optimal chess player under computational constraints" means?
Unlike chess, there is no "optimal play" for math. And if there is, I think it would be considered slow/boring (if we do have computational constraints).
The objective criteria is fuzzy, sure, but I am not convinced that you couldn't eventually figure out a mathematical function expressing your target. Once we didn't know how to formalize chess, I think (I'm sure we didn't know how to formalize a unbounded-compute algorithm for it). Our brains have encoded the desirable goal, and it's perhaps just us happening to not be able to extract the true concept that the blackboxes implement that prevents us from formalizing the objective.
In other words: am I really different from a suped up chess bot (with different cognitive archinecture ofc) with a different goal that isn't directly available to my cognition?
I'm not saying that such formalised objective couldn't exist. My claim is that we (probably) haven't yet found one. And if there will be one, it wouldn't be "metaphysically objective", it will just spit out very insightful theorems very fast.
I agree that we haven't found one (extremely confident!). I am not sure how metaphysically objective you are looking for: I expect that part of what humans look for is a 'metaphysically singled concept' (besides practicality), something like good math abstractions to handle the space of proofs, though it may only work if you add on a 'terminal'/'prior' weighting for interestingness.
Are there really whole swaths of completely uninteresting ZFC string manipulation theories? I just don't know. Alternatively: would aliens also have group theory? I think: probably!
Oh, I plan to post on the topic of alien math. But in short - aliens are going to be guided by beauty/interestingness/utility for the same reason evolution pushed humans to value them, so a lot of our math could intersect (but you still need aliens or humans to pluck out those valuable math bits, you can't force math look at itself hard enough and present those parts to you).
And they would have group theory because our universe is just full of symmetries.
Even if our universe was not, I expect groups to be discovered by aliens regardless.
I'm not saying that something with no info to point it at the sorts of things humans would care about would get close (e.g. the aliens get data about the same universe we got data from), only that a core part of what math humans find interesting will end up being something simple and 'singled out'.
Tangentially I don't think of group theory as fundamentally being about symmetry, though I see symmetry as being about groups. that is, that symmetry is well modeled by groups, but groups are not fundamentally about them.
Instead I think of it simply as a generalization of grade school algebra, asking when you can do things like xy = z -> x=z/y. Thus, rings and fields, which are also not about symmetry, and instead are about generalized grade school algebra.
In a universe where symmetries weren't so fundamental to physics but you still had arithmetic being useful for world modelling, I would expect life there to invent polynomials, symbolic algebra, and abstract algebra.
No, there's no single "optimal play" for math, but it wouldn't surprise me if there were a set of fundamental mathematical structures which would be consistently rediscovered by a wide variety of different "math playing" algorithms (of sufficient math-playing skill).
I don't think you can easily separate mathematics into either invention or discovery. It's both. It's a form of pseudo-play.
You made great points about the human nature of math, designating it an invention. But math implies more unknown math, as theorems exist within the mathematical structure we have built, so we are in a sense, discovering parts of an invention.
Think of it like legos. Simple build system, which generates enormous complexity. We created something infinitely complex and we've been playing with it, finding new tricks along the way, for the last 44,000 years. But lets actually investigate this claim:
Scholarpedia lays out 5 pillars of play:
I would say 2,3, and 4 all fit trivially, Thus it's not play, (unless you like math like me) but rather almost play.
Let me know if you think I'm wrong or I'm not making any sense.
I mean, math is invented in the same sense fridges are invented. Is there a space of designs, search over which, some search process could stumble upon and therefore "discover" a fridge design? Sure. But at that point, we call this invention.
I think there's a significant difference of scale between a fridge and Mathematics.
Math is a set of rules and concepts we apply to itself in order to perpetually generate new rules and concepts.
A fridge is just an icebox.
Now if you created a series of rules and concepts around the fridge, then yes I would grant you they are similar.
The difference is is the social relationship humans have to the object. Math has a social relationship consisting of specific formalisms. A fridge has no such social relationship. It's just where we humans like to keep the ice-cream.
(Epistemic status: I consider the following quite obvious and self-evident, but decided to post anyways.[1])
— Paul Erdős, probably
There've been a few attempts to create mathematical models of math. The examples that come to my mind are Gödelian Numbering (GN) and Logical Induction (LI). Feel free to suggest more in the comments, but I'll use those as my primary reference points. In this post, I want to contrast them with the way human mathematicians do math by noticing a few features of their process, the ones that are hard to describe with the language of math itself. Those features overlap a lot and reinforce each other, so the distinction I make is subjective. There's also probably more of them, those are just what I was able to think of. What unites them is that they make mathematical progress more tractable.
Theorem Selection
The way in which Kurt Gödel proved his incompleteness theorems was by embedding math into the language of a mathematical theory (number theory in that particular case, but the trick can be done with any theory that's expressive enough). But this way of describing mathematics is very eternalistic: it treats math as one monolith. It does not give advice on how to make progress in math. How could we approach it in a systematic way?
Fighting the LEAN compiler
What if we just try to prove all statements we can find proofs for?
Let's do some back-of-the-envelope Fermi estimations. Here's a LEAN proof of the statement "if and if , then " (sorry for JavaScript highlighting):
example (a : ℕ → ℝ) (t : ℝ) (h : TendsTo a t) (c : ℝ) (hc : 0 < c) :
TendsTo (fun n ↦ c * a n) (c * t) := by
simp [TendsTo] at *
intro ε' hε'
specialize h (ε'/c ) (by exact div_pos hε' hc)
obtain ⟨B, hB⟩ := h
use B
intro N hN
specialize hB N hN
/-theorem (abs_c : |c| = c) := by exact?-/
calc
|c * a N - c * t| = |c*(a N - t)| := by ring
_ = |c| * |a N - t| := by exact abs_mul c (a N - t)
_ = c * |a N - t| := by rw [abs_of_pos hc]
_ < ε' := by exact (lt_div_iff₀' hc).mp hB
It's 558 bytes long in its current form. I didn't optimize it for shortness, but let's say that if I did we could achieve 100 bytes, or 800 bits. Let's say that we run a search process that just checks every possible bitstring starting from short ones for whether it is a valid LEAN proof. There are possible bitstrings shorter than this proof. So if the search process checks proofs a second, we will reach this particular proof in years. Not great.
That marks the first and most important unmathematical feature of math: the selection of theorems. We do not prove, nor do we strive to prove, every possible theorem. That would be slow and boring. GN enumerates every statement regardless of its importance. LI prioritizes short sentences, which is an improvement, as it does allow us to create a natural ordering in which we can try to prove theorems and therefore make progress over time. But it's still very inefficient.
Naming
The way we name theorems and concepts is important. Most of the time we name them after a person (though most of the time it's not even the person who discovered it), but if you think about it, the Pythagorean theorem is actually called "the Pythagorean theorem about right triangles." Each time we need to prove something about right triangles, we remember Pythagoras.
LI and GN all name sentences by their entire specification, and that shouldn't come as a surprise. There wouldn't be enough short handles because, as described above, they try to talk about all sentences.
Naming allows us to build associations between mathematical concepts, which helps mathematicians think of a limited set of tools for making progress in a specific area.
Step Importance
When we teach math, we do not go through literally every step of a proof. We skip over obvious algebraic transformations; we do not pay much attention when we treat an element of a smaller set as an element of a larger set with all properties conserved (when doing 3D geometry and using 2D theorems, for example); we skip parts of a proof that are symmetrical to the already proven ones ("without loss of generality, let X be the first...").
We do that because we want to emphasize the non-trivial parts. And the feeling of non-triviality is a human feeling, not identifiable from a step's description alone. This same feeling is also what guides mathematicians to prove more useful lemmas.
GN doesn't do that — it checks every part of the proof. I'm not as sure about LI; there might be traders that do glance over obvious steps but check more carefully for less trivial ones.
Lemma Selection
Some theorems are more useful and more important than others because they help prove more theorems. This score could hypothetically be recovered from some graph of mathematics, but it is usually just estimated by math professors creating the curriculum. This taste is then passed on to the next generation of mathematicians, helping them find more useful lemmas.
GN doesn't try to do that. LI might do that implicitly via selecting for rich traders.
Real-world Phenomena
The reason humans started doing math was that they noticed similar structures across the real world. The way you add up sheep is the same way you add up apples. Pattern-matching allowed mathematicians to form conjectures and target their mathematical efforts. ("Hmm, when I use 3 sticks to form a triangle, I end up with the same triangle. What if that's always true?")
GN and LI do not do that because they do not have access to outside world. Though there is a mathematical theory that attempts to do precisely that, which is Solomonoff Induction.
Categorising
This is very similar to Naming: we separate math into topics and when we need to prove some statement we know where to look for tools. GN and LI do not attempt to do that.
An important caveat, applicable to most of the features above: there should be a balance. If you stick too much within a topic, you will never discover fruitful analogies (algebraic geometry being helpful for proving Fermat's Last Theorem is a great example). Too much reliance on any one feature and you lose creativity.
Curiosity/Beauty
There isn't much I can add about this one, but it's arguably the most important. It both guides the formation of conjectures and helps with intermediate steps.
GN and LI definitely lack it.
Conclusion
All of this is to support the point that math is invented rather than discovered. I agree that there is a surprising amount of connection between the different types of math humans find interesting, and there is probably more to learn about this phenomenon. But I wouldn't treat it as a signal that we are touching some universal metaphysical phenomenon: this is just human senses of beauty and curiosity, along with real-world utility and patterns echoing each other (partly because human intelligence and the senses were shaped to seek usefulness and real-world patterns).
Because of this and this.