This post follows on from my previous post detailing some areas where I was unable to reproduce Scott’s analysis of how the age gap between siblings modifies the SSC Birth order effect. I suggest you read that post first but here’s the summary:
I attempted to reproduce Scott’s analysis of Birth order effect vs Age gap. I found that:
There appeared to be an error in graphs 2 & 3 where people with one sibling were counted when they shouldn’t have been (graph 2) or were counted twice (graph 3)
Comparing oldest children to youngest children causes a bias in the results which can be prevented by comparing oldest children to 2nd oldest children
I was unable to reproduce Scott’s result on people reporting 0 year age gap – I get a non-significant 58% older siblings compared to Scott’s 70%. I was unable to discover the cause of the difference.
I reanalysed how sibling age gap modifies the SSC birth order effect. I found that:
The birth order effect is relatively steady for the first 4-8 years of age gap at about 70% respondents being the firstborn vs secondborn. For larger age gaps the effect reduces. There is insufficient evidence to conclude how long this reduction takes or whether the effect is completely removed at very large age gaps.
2 other trends were noted in the data but evidence for them was not strong:
Considering competing theories on the cause of the Birth order effect, two theories fit the data well:
And three theories fit the data poorly:
The original reason for me looking at this data was to analyse whether the data support a sudden drop between years 7 and 8 or whether there is an alternative explanation which fits the data.
I will note here that I’m not a trained statistician and am using this as practice of Bayesian model comparison, inspired by johnswentworth’s recent model comparison sequence. I’d say I’m 80% confident in my broad conclusions, less so in the specifics - I'd be fairly confident there are a couple of errors lurking in here somewhere.
Getting back to the data, here’s the result that I’m going to focus on, comparing 1st to 2nd children in all family sizes:
Eyeballing the graph makes the sudden drop after 7 years look like the most natural explanation. However, we had no reason, a priori, to think that a 7 year age gap would have any special significance – a drop could have happened after 1 or 10 years for all we knew.
If we model a sudden drop after 6 or 8 years the model starts to match the data significantly less well, any further away from 7 than that and the model performs really poorly. Although a general “sudden drop” model has a high maximum likelihood at 7 years, the overall model likelihood is lower due to the lower likelihoods for other drop years.
Imagine a model which is similar to a sudden drop model but the drop is ramped down over a number of years. The model is defined by 4 parameters – percentage oldest sibling before the ramp (p0), percentage oldest sibling after the ramp (p1), at what age gap the ramp starts (ts) and over how many years the ramp occurs (tr).
The sudden drop model is nested within this model - where tr=0.
A gentler slope doesn’t match the data as closely as a sudden drop but is less harshly penalised over a range of ramp start locations. The graph below shows what some tr=4 years ramps might look like.
To find out which ramp lengths fit the data best I integrate (numerically) across the first 3 parameters in this model (p0, p1, ts) to find which value of the 4th parameter (tr) predicts the data the best – how sudden is the drop?
For this analysis I haven’t grouped the 10+ year age gaps together but used the actual values for the age gaps.
For all calculations in this post I assume a uniform prior across a reasonable range for each parameter.)
Surprisingly, the likelihood is fairly flat over a large range of slope lengths – everything between 0 and 10 years is within a Bayes factor of 1.15 of each other.
To see what’s happening, let’s integrate over the first two parameters (p0 and p1) and plot likelihood against ramp length (tr) and start (ts).
This shows a maximum value at tr=0, ts=7 – the sudden drop after 7 years which is so visually noticeable in the data.
However, if you follow the line along tr=0 (back of the graph), there is only a small range of ts values which have a high likelihood. Looking instead along tr=5, the maximum likelihood is lower (~33% lower), but there is a larger range of ts values which provide a fairly high likelihood. The decrease in maximum likelihood is almost exactly cancelled out by the increase in the width of the distribution.
So a sudden drop predicts the data approximately as well as a more gradual drop.
We can also integrate across tr to find the posterior probability of the various tsvalues.
I'm going to describe this as the ramp starting between 4 and 8 years.
I also integrated over tr and tm in order to see how likelihood varied with p0 and p1.
p0 is very precisely defined between 0.70 and 0.71.
p1 can take a large variety of values, between ~0.49 & 0.62 (90% CI).
In reality, the Birth order effect might decrease relatively fast to start with and then more slowly as oldest and second oldest children approach parity. This is probably the kind of thing which we would expect in real life but which can't be recreated with the ramp model.
I created an exponential decay model (with a delay in the decay starting) to test whether this might be the case and it got a slightly higher overall likelihood than the general ramp model (Bayes factor 1.5). The start of the decline was in the region 3-8 years, similar to the ramp model. The maximum likelihood half-life was 5 years although this could be anywhere between 1.2-11 years (90% CI).
Using these models I calculated expected values for Birth order effect vs age gap.
This looks fairly sensible to me. There is a gradual start to the slope, becoming steeper into about year 8 and then shallowing out as we get closer to parity between older and younger siblings.
At larger age gaps the two models diverge which is due to a combination of the differing priors implied by the models and the sparsity of data points in this region - the likelihood isn't sufficient to overcome the prior.
I also compared the general ramp model to a constant Birth order effect model. The ramp model was preferred over the constant model by a Bayes factor of ~1,000.
A constant model is actually nested within the ramp model where p0=p1 (and tr, tm become meaningless). This is illustrated by the red line on the likelihood vs p0 & p1 graph where the low likelihood can be seen.
I mentioned in my previous post that it appeared that the drop was present in sibships of 2 but not in sibships of 3+.
Breaking this down further, we can compare this effect for sibships of 2, sibships of 3 and sibships of 4+ (any further breakdown causes the sample sizes to get too small).
(The very low value at 7 year age gap for 4+ children is only a sample size of 11 so don’t take it too seriously!)
Here it appears that the drop-off in birth effect for large age gaps between first and second children happens in sibships of 2 or 3 but doesn’t happen in sibships of 4+.
Although the number of samples in the 4+ group with >7 year age gap is only 64, the difference between 2-3 and 4+ sibships is significant at p<0.05 (two-tailed t-test).
This seems an odd phenomenon. Would having extra siblings cause the birth order effect between the oldest 2 siblings to remain high for large age gaps?
Seeing something weird like this in my data causes me to ask “how many things might I have spotted during my work on this project, if they had coincidentally shown a weird looking result?” – when adjusting for post-hoc multiple hypothesis testing I should adjust not just for the tests that I did but also for the tests I didn’t do just because nothing looked odd.
In this case the answer is quite a lot so p<0.05 is probably not strict enough and my best bet would be that this data occurred by coincidence.
That's all a bit hand-wavey so I tried to calculate the Bayes factor comparing:
A general ramp model for all family sizes
A general ramp model for families of 2 & 3 children combined with a shallower (or no) ramp for families of 4+ children (Only p1 was changed between the family sizes)
The latter was preferred by a factor of 5. If I were to include other numbers of children when the change might have happened or possibility that the change happens gradually as family size got bigger then this factor would change but that would start getting way too complicated for me!
I still don't really believe this an actual effect but if someone has an explanation of what might cause this then I'm all ears.
One other thing which I noticed is the lower Birth order effect for age gaps of 1 year as compared to gaps of 2-7 years (0.66 vs 0.71 oldest siblings). A quick calculation suggests Bayes factor comes out at 2 in favour of the Birth order effect being lower at 1 year age gap compared it being constant across 1-7 year age gaps.
Note in this case that although the Bayes factor isn't huge, it seems like this is the kind of thing which might actually happen (some of the potential causes would give this a decent prior - see section below for more discussion) so I'm much less inclined to just write this one off.
Scott lists 5 potential causes of the Birth order effect:
1. Intra-family competition
2. Decreased parental investment
3. Changed parenting strategies
4. Maternal antibodies
5. Maternal vitamin deficiencies
I‘be renamed 1 to "Intra-family dynamics" to include non-competitive interactions between siblings. A few people have mentioned other sibling dynamics which might cause a Birth order effect (e.g. here). The predictions of age gap effect from competitive vs non-competitive causes seem similar to me so I'll lump them together.
My thoughts for what each of the 5 potential causes would predict regarding age gap are given below. The conclusions for each potential cause end up being very similar to Scott’s (after all that work!) except that there is no need to postulate anything especially significant about 7 years and that there may be a slight increase in birth order effect between 1 and 2 years age gap.
Prediction: Birth order effect remains roughly constant with small age gaps, with less effect as the gap gets larger.
Assessment: Findings match prediction well. 4-8 years seems reasonable for levels of interactions between siblings to start decreasing.
Potentially, for a small age gap, a very advanced younger sibling might act more like an older sibling meaning that the 1 year age gap birth effect would be lower. This feels slightly forced to me (I would think any such effect would be fairly small) but am curious what others think.
Prediction: Birth order effect increases as age gap increases - the longer a firstborn is the only child the longer they benefit from 100% of their parents’ attention. If the earliest years are the most important then birth order might not change after that critical period. Once older children are able to look after themselves, birth order effect might come down with larger age gaps.
Assessment: The increase in birth order effect between 1 and 2 years would match the theory, if parental investment is mostly important in the first two years. If older children start being able to look after themselves after 4-8 years then this would explain the drop in birth order effect after this time.
The match between the theory and result is good, although there are a couple of degrees of freedom to help match the prediction to the data. 4-8 years seems reasonable for children starting to look after themselves better but 2 years seems on the low side for a prediction of how long having extra attention is beneficial. Maybe between 2-5 years the two effects roughly cancel out?
Prediction: Age gap has minimal effect on Birth order effect.
Assessment: Prediction matches data poorly. It is possible that parental strategies start to reset towards firstborn strategies after longer age gaps but I wouldn’t have put much of my probability mass on that option. There is a 5 year gap between my youngest children and I definitely didn’t reset towards firstborn strategies, I suspect this would have still been true even for a much larger gap.
Prediction: Age gap has minimal effect on Birth order effect. Generally you don’t need top-ups of vaccines so presumably antibodies stick around indefinitely? Or is it your body’s ability to make more? Anyway, Scott thinks this is unlikely and he’s a doctor so I’ll take his word for it.
Assessment: Prediction matches data poorly. My biology knowledge is too poor to know how likely a decrease in effectiveness after 4-8 years would be in this case.
Prediction: Very small age gaps have large effect. Birth order effect decreases rapidly for age gaps <3 years – my estimate for how long it might take to rebuild vitamin stockpiles.
Assessment: Prediction matches data poorly. 4-8 years seems way too long for vitamin stockpiles to start to build back up.
The SSC 2019 survey data support a constant, high, birth order effect (~2.4 oldest siblings for every 1 second oldest sibling) for age gaps <4-8 years. This is followed by a decline to a lower birth order effect at an undetermined rate. The decline does not necessarily completely remove any birth order effect although this may be the case for very large age gaps.
The data provide some evidence that:
However the evidence for both of these points is relatively slim.
Intra-family dynamics and decreased parental investment predict the results well.
Changed parental strategies, maternal antibodies and maternal vitamin deficiencies do not predict the results well.
This post was accidentally released a day early for a few hours before I moved it back into drafts. Apologies for any confusion.
Oh, that might have been my fault, since I was helping you with some formatting. I apologize!
No worries, thanks for fixing my pictures!
Excellent post, thanks!!