What's the Point of the Math?

Ashe Vazquez Nuñez

This post was written while at MATS 9.0 under the mentorship of Richard Ngo. It's only meta-related to my research.

I would like to start by quoting a point Jan Kulveit made about economics culture in a recent post.

non-mathematical, often intuitive reasoning of [an] economist leads to some interesting insight, and then the formalisation, assumptions and models are selected in a way where the math leads to the same conclusions.

Jan notes that the math is "less relevant than it seems" in the process. I resonate with this. The mathematical results are predetermined by the assumptions in the model, which in turn follow the insight born from intuitive reasoning. This begs the question: if the important part of the insight is the intuition, and the math is its deterministic (if laborious) consequence, then what exactly is the point of the math? What insight does it bring? These questions of course apply not only to economics, but to any mathematisation of worldly^[1] phenomena.

In this post, I describe some roles math plays in delivering and communicating insight to researchers and the social structures they are embedded in. I'm interested in characterising the legitimate uses of mathematical formalism, but also in commenting on the incentives in science and society that instantiate its misuse.

Calibrating intuitions

The most innocent and consistently solid use of math is in verifying qualitative intuitions. Suppose a researcher has an interesting insight about economics concept A; they suspect that their insight has a non-trivial implication for some other concept B. The process of mathematising A, B, and the effect of A on B is in fact one of calibration. If the intuitive model is well-calibrated, then the researcher's instincts about these ideas and their consequences will check out in the mathematical formalisation. Note that this is a necessary but not a sufficient condition; your math could check out even if your intuitions are bad. However, the math not checking out probably means it could be improved.

In the possible absence of empirical feedback loops, math serves as a surrogate tool for scientific falsification. I have found math useful for this in my own research, though it's worth noting that calibrating with concrete experiments or simulations are alternative/complementary approaches.

Communicating your work

Even if a researcher has conducted clear, intuitive reasoning that they believe is well-calibrated, they might need to communicate their useful insight to other researchers and society at large. For this purpose, math has some clear benefits due to its relative unambiguity. Firstly, it allows for transfer of verifiable claims from a researcher to their field. Secondly, sufficiently concrete mathematical formalisations may be amenable to communication beyond the research bubble. The most memetically successful pieces of math are those that can be implemented by arbitrary workers or even calculators.

Even though math has these advantages, the notion that it is less lossy than the alternatives as a communication tool is rather non-trivial. What it gains in verifiability of its claims, it may lose in the amount of insight communicated. Mathematical formalism can be painstaking to generate from intuitions, and the finished product generally does not lend itself to the recovery of those intuitions. The choice of the extent to which mathematisation dominates scientific communication illustrates a trade-off between different types of loss.

Communication channels can degrade due to perverse social dynamics and incentives, losing their legitimate purpose along the way. This results in what I'll call communication traps for researchers. I'll describe a couple in this post.

The lossiness of memetically fit mathematics

Null Hypothesis Significance Testing (NHST) is (still) the dominant paradigm for statistical inference, especially in the social and medical sciences. This was not a foregone conclusion, nor is it an adequate state of affairs. Ronald Fisher, one of the early advocates for null hypothesis testing^[2], was originally a bayesian^[3]. When he settled on frequentist inference, he disagreed on matters of philosophy with his colleagues, Neyman and Pearson, who instead proposed a decision rule to choose between two competing hypotheses.

Fisher, Neyman and Pearson all attempted to develop paradigms for statistical inference with the aim of adoption by scientific researchers. Unfortunately, their wishes were granted in the worst of ways. NHST conflated^[4] their approaches into a version of null hypothesis testing which uses a decision-rule based on the p-value, something that none of them seems to have ever advocated for. The sneerily titled "The Null Ritual" documents the emergence of this testing paradigm in psychology over the 1940s and 1950s. The authors additionally present some of their anthropological work assessing the (blood-curdling) illiteracy among psychology students and educators about the philosophy and insight that statisticians originally meant to communicate.

"The Null Ritual" also discusses various possible reasons for the emergence and maintenance of this paradigm. Without having delved into the extensive sociological literature on the topic, my instinctive guess is that NHST "won" because it is simple, easy to execute (including computationally), and easy to misinterpret as indicating a decisive conclusion. I find it particularly suggestive that Fisher recommended the p-value as a useful metric that could inform further experiments, but NHST turned it into a goal for researchers to p-hack. The main appeal of NHST was quite possibly that it provides the illusion of a computable binary decision rule authorised by a sense of mathematical legitimacy. By now, NHST probably wins by default due to its widespread use in education.

This example illustrates that our mathematisations are not nearly as unambiguous as we would hope. The communication of a mathematical formalism and any decision procedures it suggests should aim to be robust to perturbations caused by miscommunication and the incentive structures of practitioners. Science is probably better off with NHST than if we had just "vibed things out" or used whatever statistical inference we did before, but a more successful dissemination of statisticians' insight might have softened a replication crisis or two along the way.

"Proof" of work

I already discussed that mathematical formalism trades off between different types of information loss. However, the verifiability of mathematical (or statistical) claims makes math a justifiably popular means for communication between researchers. Unfortunately, research communities tend to converge to turning metrics of published research output into goals that determine status among its members. Consequently, researchers are subject to incentives to prove the value of their work by exporting it in mathematised terms.

I think this systematically results in premature mathematisation. Consider a researcher, Alice, seeking to formalise an insight she has, possibly for entirely noble reasons to start with. She judges that a formalisation that respects the spirit of the original insight may take years. Her research will additionally involve making appropriately simplified or related formalisms that may take a year or two, but will help slowly build up the full theory. She has also thought of some simplified potential formalisms that are likely tractable in the scope of a few months, but aren't ultimately promising for faithfully communicating the insight.

However, Alice's project's funding is up for renewal this year. This presents her with two options. On the one hand, she could focus on the formalisms she thinks will ultimately be fruitful for the mature theory, and risk having no legible output to show her progress for a project proposal. On the other hand, she could work on the tractable approaches that her research taste repelled her from, but will give her something to show to reviewers.

Sadly, I suspect researchers are regularly faced with such dilemmas. They'll often choose against their research tastes, and can justify their choice (with varying degrees of accuracy) as enabling them to continue their research at all.

In my day-to-day research, I have become increasingly aware of how I intuitively gravitate towards lines of inquiry I expect to be mathematically tractable within the scope of the program I'm currently in. I don't think this is directly due to a specific incentive imposed on me by anyone or anything in particular; I'm lucky to have amply sufficient levels of intellectual freedom. I rather perceive my affect as an artefact of the culture I grew up in, which taught me to value mathematical formalism for its own sake. I aspire to one day be a person free from the compulsion to show their work.

^{^}
I can't use "real" since the mathematicians reserved the term
^{^}
This is not exactly the same as NHST, both in terms of the philosophical motivation and the practical implementation
^{^}
See this commentary of Fisher's early work for details. Fisher had disavowed some of his early work's reliance on "inverse probability", bayesian probability's old name, by 1922.
^{^}
Wikipedia politely says the approaches were "combined"

LESSWRONG
LW