[CONTEXT: For a while I have been meaning to engage with a literature review on heavy tailed distributions. Instead of just indefinitely postponing the project I resolved to write some preliminary thoughts on the topic, so I can get started on understanding the concept better with a less daunting task]
TL;DR: There are many formalizations of heavy-tailedness out there. I define five intuititive principles that I expect a good definition to satisfy: action-relevance, distinguish negative and positive risk, allow finite support, apply to empirically observed phenomena and provide a characterization in terms of a universal class of distributions. I discuss each in turn and provide examples.
Heavy-tailed distributions occur when extreme, low-probability yet plausible outcomes dominate decision-making.
For example, when considering how to contain a pandemic, an official will not want to focus on low-impact scenarios where eg the pandemic dies out on its own, nor on implausible scenarios where eg a solar flare messes up with our electronics during the crisis. Instead she will focus on scenarios where eg the pandemic grows out of control because its contagion rate is higher than expected - a plausible scenario that albeit unlikely is disastrous enough to warrant precaution.
Heavy-tailed distributions are an important object of study in cause prioritization - we should focus on studying such distributions to the extent that extreme outcomes dominate long term decision-making.
My informal impression is that the notion of heavy tail distributions has been heavily discussed among mathematicians, especially in the context of extreme value theory. However, there is no single agreed-upon formalization of the concept, making discussion and the application of the concept notoriously difficult.
Through this post, I will explore some important concepts around heavy-tailedness that we want an ideal definition to make precise.
My hope is that having this discussion will help us later productively discuss the strengths and weaknesses of different proposed definitions of heavy tailedness
In short, an ideal definition of heavy-tailedness would be action-relevant, able to distinguish risks from hits, be well-defined for distributions with finite support, describe natural phenomena and adscribe heavy-tailedness to a universal class of distributions.
We hope the definition of heavy tailedness to suggest a qualitatively different approach to statistical inference and decision-making.
For example, we would like heavy tailed distributions to simplify decision-making (eg, via a dominance result that recommends to never expose ourselves to heavy-tailed risks) or show the inadequacy of standard methods (eg, a result showing in a precise sense that historical data on a heavy-tailed distribution is not a good predictor of future performance).
To the extent that heavy-tailed distributions are already well-studied by standard methods we will be better off not introducing a new concept.
For example, a formalization of heavy-tailedness based around the notion of non-finite second order central moments (ie variance), or that implies non-finite second order central moments, would satisfy the criteria for action relevance, as it would imply that the mean of heavy-tailed distributed iid variables does not neccesarily converge to a normal - a common and load-bearing assumption in statistics.
Distinguishing left and right tails
Extreme outcomes take two forms: extreme negative outcomes (risks) and extreme positive outcomes (hits).
For example, a calamity such as drastic, unexpected, sudden climate change melting the poles and causing massive floods would be a risk. Meanwhile, an unexpected discovery of a cure against cancer would count as a hit.
In cause prioritization, we hope to expose ourselves to hits, while minimizing risks. Thus we want out discussion of heavy-tailedness to distinguish between both.
For example, a definition of heavy tailedness formalized as leptokurtic distributions is unsatisfying in this sense - there is no meaningful way to talk about right and left leptokursis.
However, the formalism of subexponential distributions easily allows to distinguish left and right fat-tailedness.
Allowing finite support
Reality is inherently bounded - I can confidently assert that there is no possible risk today that would endanger a trillion lives, because I am confident the number of people on the planet is well below that.
In statistics, we usually resort to distributions over unbounded possible outcomes to simplify matters. This is usually admissible, since most of the probability mass is contained in a sensible-enough finite region, and thus the probability mass assign to absurd outcomes can be treated as a rounding error.
However, when discussing heavy-tailed distributions, we are precisely studying the region of extreme outcomes. If our definition of heavy tailedness requires the distribution to have infinite support, we risk our analysis focusing on absurd outcomes.
All definitions of heavy tailedness that rely on asymptotic behaviour, such as the definition of power laws, do not allow finite support. In contrast, notions of heavy tailedness based on measures of inequality such as the Gini coefficient allow finite support.
Describing natural phenomena
Many everyday phenomena are documented to be distributed normally, including eg height, etc.
Similarly, if the notion of heavy-tailedness is to be useful, we would expect it to happen in many decision-relevant scenarios. Thus we would hope to identify many empirical distributions that conform to our definition of heavy tailedness.
This also suggests a different approach to formalizing the concept - instead of starting from the a priori requirements, we could work first on identifying heavy tailed distributions, and developing a useful language to study them by looking at particular cases.
Some such empirical distributions usually considered to be heavy tailed include Zips Law and Benford’s Law.
Normal distributions are heavily studied in statistics, because they occur as the limiting distribution that arises when you take the mean of iid variables of finite variance.
This corresponds to a theoretical reassurance than treating the mean of some unknown distributions that exhibit empirically finite variance as if it was a normal will be good enough for inference and decision-making.
Analogously, we would like our definition of heavy-tailedness to apply and adscribe heavy-tailedness to a general limiting class of distributions, so we can use it to study general distributions.
The universal class of distributions that comes up again and again when discussing heavy tail distributions are the Levy alpha stable distributions. Thus we would expect our definition to apply to this class, and to provide a characterization of heavy tailedness in function of the parameters of the class.
We have discussed some properties that we would like a good formalization of the concept of heavy-tailedness.
There are several paths we could take from here, including:
- Refining the properties where possible, expanding them with more examples, contesting their desirability
- Conducting a review of existing formalisms related to the concept of heavy-tailedness
- Studying how the properties interact with each other, and hoping to shed light on a tentative definition - or an impossibility result
- Collecting a sample of empirical and theoretical distributions commonly considered to be heavy-tailed, to reflect on what makes them heavy-tailed
The topic of heavy-tailedness is one that I have seen used and abused in many situations, and I think that developing a shared understanding of what it means in a precise sense will help us communicate better and make better decisions.
We cannot discard the possibility that this could be a dead research path - for example, our intuitive understanding of the topic might be good enough for decision making, the formalization may be beyond our current mathematics or the notion of heavy-tailedness might be misleading in the sense of not requiring a separate treatment from non-heavy-tailed distributions.
Nevertheless, I think that this is a research path worth exploring, and I would be keen on reading more on the topic. Let me know in the comments if you have further research ideas, clarifying concepts or questions of your own.
This blogpost was written by Jaime Sevilla, visiting researcher at the Center for the Study of Existential Risks, under a grant from the Effective Altruism Foundation. I’d like to thank Max Daniel and Ronja Lutz for conducting some preliminary research on the topic with me a while ago.
Can you give some more intuitions as to why allowing finite support is among your criteria?
I can imagine a definition which, lacking this criterion, is still useful, and requiring to have infinite support might be a useful reminder that 0 and 1 are not probabilit(y densities). Further, whereas requiring infinite support might risk analyzing absurd outcomes, it may also allow us to consider, and thus reach maximally great futures.
Consider that the number of animal lives is probably greater than one trillion, and you didn't specify *human* lives. You could also consider future lives, or abstruse moral realism theories. Your definition of personhood (moral personhood?) could change. Having finite support considered harmful (?).
The point you are making - that distributions with infinite support may be used to represent model error - is a valid one.
And in fact I am less confident about that one that point relative to others.
I still think that is a nice property to have, though I find it hard to pinpoint exactly what is my intuition here.
One plausible hypothesis is because I think it makes a lot of sense to talk about frequency of outliers in bounded contexts. For example, I expect that my beliefs about the world are heavy tailed - I am mostly ignorant about everything (eg, "is my flatmate brushing their teeth right now?"), but have some outlier strong beliefs about reality which drives my decision making (eg, "after I click submit this comment will be read by you").
Thus if we sample the confidence of my beliefs the emerging distribution seems to be heavy tailed in some sense, even though the distribution has finite support.
One could argue that this is because I am plotting my beliefs in a weird space, and if I plot them with a proper scale like odd-scale which is unbounded the problem dissolves. But since expected value is linear with probabilities, not odds, this seems a hard pill to swallow.
Another intuition is that if you focus on studying asymptotic tails you expose yourself to Pascal's mugging scenarios - but this may be a consideration which requires separate treatment (eg Pascal's mugging may require a patch from the decision-theoretic side of things anyway).
As a different point, I would not be surprised if allowing finite support requires significantly more complicated assumptions / mathematics, and ends up making the concept of heavy tails less useful. Infinites are useful to simplify unimportant details, as with complexity theory for example.
TL;DR: I agree that infinite support can be used to conceptualize model error. I however think there are examples of bounded contexts where we want to talk about dominating outliers - ie heavy tails.
Do you maybe have another example for action relevance? Nonfinite variance and finite support do not go well together.
Sadly I have not come across many definitions of heavy tailedness that are compatible with finite support, so I dont have any ready examples of action relevance AND finite support.
Another example involving a momentum-centric definition:
Distributions which are heavy tailed in the sense of not having a finite moment generating function in a neighbourhood of zero heavily reward exploration over exploitation in multi armed bandit scenarios.
See for example an invocation of light tailedness to simplify an analysis at the beginning of this paper, implying that the analysis does not carry over directly to heavy tail scenarios (disclaimer, I have not read the whole thing).
It seems like if I'm trying to talk about a real-world case with finite support, I'll say something like "it's not actually a power law - but it's well described by one over the relevant range of values." Meaning that I have some notion of "relevant" which is probably derived from action-relevance, or relevance to my observations, or maybe computational complexity.
If I can't say that, then the other main option is that I care more and more as the power law gets more extreme, and then as the possibilities reach their physical limit I care most of all. But cases like this are so idiosyncratic that maybe there's no point in trying to develop a unified language for them.
That's what skewness is for.