Linch has proposed a theory to explain Wigner and Hamming's observation that mathematics seems unreasonably effective. Linch proposes an anthropic explanation here. Namely, that complex life can't evolve to survive in overly complex environments.

Anthropic arguments always leave me unsatisfied, and I'm not really convinced by this one. In particular, I think there is a better explanation, that it's harder to make reasoning mistakes using math than other languages.

I will first discuss why I'm unsatisfied with the anthropics argument, then describe and argue for my proposed alternative.

Anthropics

My argument against the anthropics explanation is short.

The anthropics explanation says that if our universe was not described using simple physical laws, it is unlikely we would have evolved, for it would be hard to evolve complex life in such a chaotic environment.

I note this argument only constrains our expectations about physical laws.

But there are many aspects of our world, which seem to have very little to do with our actual literal physical laws, which mathematics still describes quite well. In particular, probability theory & statistics, economics, thermodynamics, computability theory, optimization & control theory, and finally (to a non-trivial extent) anthropics itself.

So surely the anthropics explanation does not explain away all of our confusion here.

Rich reward signals

Gwern has an excellent post, which if you haven't read yet, you should read called Evolution as Backstop for Reinforcement Learning. Here is the summary

One defense of free markets notes the inability of non-market mechanisms to solve planning & optimization problems. This has difficulty with Coase’s paradox of the firm, and I note that the difficulty is increased by the fact that with improvements in computers, algorithms, and data, ever larger planning problems are solved.
Expanding on some Cosma Shalizi comments, I suggest interpreting phenomena as multi-level nested optimization paradigm: many systems can be usefully described as having two (or more) levels where a slow sample-inefficient but ground-truth ‘outer’ loss such as death, bankruptcy, or reproductive fitness, trains & constrains a fast sample-efficient but possibly misguided ‘inner’ loss which is used by learned mechanisms such as neural networks or linear programming. (The higher levels are different ‘groups’ in group selection.)
So, one reason for free-market or evolutionary or Bayesian methods in general is that while poorer at planning/optimization in the short run, they have the advantage of simplicity and operating on ground-truth values, and serve as a constraint on the more sophisticated non-market mechanisms.
I illustrate by discussing corporations, multicellular life, reinforcement learning & meta-learning in AI, and pain in humans.
This view suggests that are inherent balances between market/non-market mechanisms which reflect the relative advantages between a slow unbiased method and faster but potentially arbitrarily biased methods.

If you want a system which does well in a wide variety of circumstances, it's best to have several nested sources of ground truth.

Science itself takes this shape. We propose hypotheses via learned heuristics and mathematical derivations, then test them with experiments. Of course, often in science experiments are expensive in time and money, and are often relatively noisy.

Math also takes this shape, and it is a particularly nice environment to learn in. I claim this for three reasons.

First, it has near perfect ground-truth reward signals. If you prove something, you've proven it.

Second, these perfect ground-truth reward signals are cheap. You can check a proof in polynomial time, and if you can't prove something yet, you can work through examples and still learn.

Third, it has a rich ground truth reward signal. As Hamming mentions,

These are often called "proof generated theorems" [6]. A classic example is the concept of uniform convergence. Cauchy had proved that a convergent series of terms, each of which is continuous, converges to a continuous function. At the same time there were known to be Fourier series of continuous functions that converged to a discontinuous limit. By a careful examination of Cauchy's proof, the error was found and fixed up by changing the hypothesis of the theorem to read, "a uniformly convergent series."

Even when a proof fails, you often learn something. Either a better appreciation for the bottlenecks to your proof, or knowledge about which objects are "nice" in this regime.

This means we can build up very detailed & rich intuitive models of fairly complicated phenomena, as well as make long and complicated arguments about those phenomena, and be extremely confident those intuitive models and complicated arguments are correct.

Given this, is it so mysterious nearly all our intelligent thought happens to be in the language of math?

But why the repetition?

There's an interesting phenomenon in mathematics, which Wigner writes about, where seemingly disparate concepts and fields have very deep connections. Here's Wigner's opening paragraph

There is a story about two friends, who were classmates in high school, talking about their jobs. One of them became a statistician and was working on population trends. He showed a reprint to his former classmate. The reprint started, as usual, with the Gaussian distribution and the statistician explained to his former classmate the meaning of the symbols for the actual population, for the average population, and so on. His classmate was a bit incredulous and was not quite sure whether the statistician was pulling his leg. "How can you know that?" was his query. "And what is this symbol here?" "Oh," said the statistician, "this is ." "What is that?" "The ratio of the circumference of the circle to its diameter." "Well, now you are pushing your joke too far," said the classmate, "surely the population has nothing to do with the circumference of the circle."

This phenomenon can't be explained by the anthropic hypothesis. In this particular case, the Gaussian distribution's appearance and use, due to the central limit theorem, is not a contingent fact about reality, nor is the appearance of $π$ within it.

Here's an attempt at an argument someone devoted to the rich reward signals hypothesis would give:

Connections between naively disparate fields are common because in fact all mathematical truths are connected. This is seen trivially by the principle of explosion; if you have one false assumption you can prove anything. However, some fields of math are easier than other fields, possibly because they have easy to observe consequences in our physical universe (like the properties of circles), where we can ground-out "ease" in terms of the richness of the reward signals involved in that field. Some fields will also be easier or harder to connect, depending on how complicated that connection is. So we should expect to find "deep" connections between field $x$ and $y$ when the hardness of $x$ plus the hardness of finding the connection between $x$ and $y$ is less than the hardness of $y$ .

This argument puts most of the difficulty in answering why a particular "deep" connection between field $x$ and $y$ exists into the complexity of their connection. That is to say, there is more to talk about here, and this post isn't the final word. We can still ask questions about how to quantify the complexity of connections between mathematical fields, characterize why some fields have richer reward signals than others, and ask why two fields have more or less complicated connections than other fields.

These seem like interesting questions to ask, and seem more useful than a blanket "anthropics" answer given by Linch or Hamming's attempts to dissolve the question.

LESSWRONG
LW

LESSWRONG
LW

8

The reasonable effectiveness of mathematics

8

8

Anthropics

Rich reward signals

But why the repetition?