[Epistemic status: pilot study. I'm hoping that others will help to verify or falsify my conclusion here. I've never done an analysis of this sort before, and would appreciate correction of any errors.
A previous version of this post has some minor errors in the analysis, which have since been corrected. Most notably, deviation from expected rate of first borns was originally noted as 14.98 percentage points. It is actually 16.65 percentage points.]
A big thank you to Dan Keys for working through the statistics with me.
Since the late 1800's, pop psychology has postulated that a person's birth order (whether one is the first, last, middle, etc. of one's siblings) has an impact on his/her lifetime personality traits. However, rigorous large-scale analyses have reliably found no significant effect on stable personality, with some evidence for a small effect on intelligence. (The Wikipedia page lists some relevant papers on birth order effects on personality (1, 2, 3) and on intelligence (1, 2, 3).)
So, we were all pretty surprised when, around 2012, survey data suggested a very strong birth order effect amongst those in the broader rationality community.
The Less Wrong community is demographically dominated by first-borns: a startlingly large percentage of us have only younger siblings. On average, it looks like there's about a twenty-two percentage point difference between the actual rate of first borns and the expected rate, from the 2018 Slate Star Codex Survey data Scott cites in the linked post above. (More specifically, the expected rate of first-borns is 39% and the actual occurrence in the survey data is 62%.) The 2012 Less Wrong survey also found a 22 percentage point difference. This effect is highly significant, including after taking into account other demographic factors.
A few weeks ago, Scott Garrabrant (one of the researchers at MIRI) off-handedly wondered aloud if great mathematicians (who plausibly share some important features with LessWrongers), also exhibit this same trend towards being first born.
The short answer: Yes, they do, as near as I can tell, but not as strongly as LessWrongers.
My data and analysis is documented here.
Following Sarah Constantin's fact post methodology, I started by taking a list of the 150 greatest mathematicians from here. This is perhaps not the most accurate or scientific ranking of historical math talent, but in practice, there's enough broad agreement about who the big names are, that quibbles over who should be included are mostly irrelevant to our purpose. If a person could plausibly be included on a list of the greatest 150 mathematicians in history, he/she was probably a pretty good mathematician.
I then went through the list, and tried to find out how many older and younger siblings each mathematician had. For the most part this amounted to googling "[mathematician's name] siblings" and then trawling through the results to find one that gave me the information I wanted. Where possible, I noted not just the birth order and number of siblings, but also the sex of the siblings and whether they died during infancy. (For the ones for whom I couldn't get data, I marked the row as "Couldn't find" or "Unknown")
Most biographical sources don't list the number of siblings of the family of origin. The sources that I ended up relying on the most were:
- Geni.com: a platform for people to build out their family trees, and store historical or biographical information (family photos, dates of life events, etc.)
- The individual biographies on the MacTutor History of Mathematics Archive
This was a very quick cursory search, so my data is probably not super reliable. At least twice, I found two sources that disagreed, and I don't know how much I would have encountered conflicting information if I had dug deeper into each person's biography, instead of moving on to the next mathematician as soon as I found a sentence that answered my query.
If you happen to personally know biographical details of elite mathematicians and you can correct any errors in these data, I'd be pleased to make those corrections.
The simplest analysis is to categorize the data by family size (all the mathematicians that had no siblings, one sibling, 2 siblings, etc.), count how many first borns there were in each bucket, and compare that to the number we would expect by chance.
For nearly every bucket, the frequency of first born children exceeded random chance. Across all categories, the difference in percentage points between the actual and expected frequencies was about 16.5%.
After removing the individuals that I couldn't find data for, we had a sample size of 82. A paired t-test, comparing the number of first-borns with the expected number of first-borns (one data point for each of the 82 mathematicians) was statistically significant, t(81)=3.14, p = 0.00239.
I can show you some bar graphs, like Scott uses in his post, but because this data is of a much smaller sample and the effect isn't as large, they don't look as neat. (Also, I don't know how to include those nice dotted lines marking the expected frequency.)
Nevertheless, you can see a systematic trend: being the first of n siblings is overrepresented among the mathematicians in the sample I used.
The effect in these data (17 percentage points) is smaller than the effect in either the Less Wrong or Slate Star Codex surveys (22 percentage points). The 95% confidence interval for the mathematician data is a range of 6 percentage points to 27 percentage points. Given this range, we can't rule out that the difference in effect sizes is due to noise, but it seems most plausible that there is a real difference in the size of the underlying effect between the populations.
A discussion of bias in this data
As I say, my data is not very reliable, it seems plausible that some of my sources were faulty, and I was going quickly, so I may have made some errors in doing data collection. Furthermore, I was only able to find data for 82 of the 150 mathematicians.
But in expectation, those errors will cancel out, unless there's some systematic bias in the sources I was using. I can think of at least two causes of bias, but neither one seems like it could be the cause of the observed trend.
Higher reporting rate for first born children
First, maybe first borns are recorded more readily? If the first born child was the heir to a family's property, then they may have been more likely to be mentioned in legal and other documents, so there may be much better historical records of first-born children.
But our subjects are all famous mathematicians, independent from their inheritance-status. So, if there was a historical reporting bias that favored the first born, this would actually push against our observed effect. First born members of our sample would either be listed as only children, or noted as having an unknown number of siblings. Younger-sibling mathematicians, on the other hand, would be noted as younger siblings, because their older brother is added to the historical record on the basis of their heirship.
Underreporting of females
Another way in which the available record of sibling data may be biased, which does not directly affect the validity of this analysis, is that women might have gone unrecorded more often than men. The size of this effect tells us something about the extent to which the available record of sibling data is biased.
It was relatively easy to do a quick check for a reporting bias in favor of male siblings: I just summed all the brothers that I found, and all the sisters.
All together, I recorded 110.5 brothers and half brothers and 100.5 sisters and half sisters. (The point five comes from Jean-Baptiste Joseph Fourier's entry. I found that he had 3 half siblings by his father's first marriage, but I didn't know of what sex. So I split the difference by saying he had 1.5 half brothers and 1.5 half sisters, in expectation. I was comfortable doing this because I mostly care about whether siblings are younger or older, and only secondarily about if they are male or female.)
So there are slightly more males listed, at least in the sources I could find. But a difference of 10 out of 211 siblings with recorded sex, isn't very large. I'm sure there are some statistics I could do to show it, but I don't think that slight bias is sufficient to account for our observed birth order effect.
I'm hoping that others can think of reasons why we might see a trend in these data even if the birth order effect wasn't real.
This is a pretty intriguing result, and I'm surprised no one (that I know of) has noticed it before now.
I think this post should be thought of as a pilot study. I put in about 20 hours to investigate the hypothesis, but only in a quick and cursory way. I would be excited for others, who are better informed and better-equipped than I am, to do a more in-depth analysis into these topics.
Do mathematicians of lesser renown display this birth order effect? What about prominent (or average) individuals from other STEM fields? Non-STEM fields? I'd be interested to see an analysis of the most successful business executives, for instance.
Furthermore, more investigation could uncover detail about how having older siblings gives rise to this effect.
Some explanations for this phenomenon rest on social interaction with older siblings in one's first few years. Others depend on biological consequences of spending one's fetal period in a womb that was previously occupied by older siblings. In principle we should be able to tease out which of these mechanisms generates the effect by looking at much more data that tracks older siblings that died in infancy, and older half siblings. (Siblings that died in infancy can't mediate the social effect, while half siblings can mediate a biological effect depending on which parent is shared, and can mediate a social effect depending on whether they were living in the household at the time of birth.) If someone found a larger dataset that tracked these factors, we might be able to falsify one or the other of these stories.
And again, please inform me of any errors.