The Collider Bias Theory of (Not Quite) Everything

Quick Summary

Collider bias and Berkson's paradox are pretty common and often neglected
I think it's not just a niche statistical concept: it explains a bunch of interesting stuff, and has some use in applied rationality
Scott Alexander has written about something similar, and there's a LW post that explains some of the findings in an accurate but slightly dry way. This post tries to make it clear and more readable

Preamble

It’s your first day in prison.

Despite your insistence that the man who approached you and asked for your wallet in exchange for preventing an infinitely large amount of future suffering had in fact been mugging you, your claim of self-defence fell through.

The judge tells you that, judging by the severity of your crime and your assessed mental stability, you will be sent to a medium-security prison for three years. Each prisoner has one cell-mate. You know that whether this guy is nice or nasty might determine whether your whole time in prison is moderately unpleasant or a living hell.

You walk in and see him leaning against the bed. He looks relatively normal for a prisoner—muscular, no visible gang tattoos, serious expression. You ask with apprehension:

“So…what are you in for?”

“Serial murder.”

Relief floods over you and you instantly hug the surprised murderer, who smiles a little apprehensively, and gives you an awkward pat on the back:

“What was that for?”

Back to school

Do you ever find yourself wondering why the smartest people around you seem to be less hard-working? Or why the more hard-working people don’t seem to be super smart?

If not, I can guess why! We’ll come to you later.

But if so, fantastic, let’s start by verifying your intuitions.

Imagine the situation, you’re 11 years old in a British comprehensive school in a fairly mixed-income area of the UK. After the Christmas holidays, your school year is divided into 6 ability-based sets for each academic subject, determined by your classwork and exam results from your first term. You’ve just been assigned to Set 2.

Bored in your first physics class, you look around trying to work out who the smartest people in your class are. There’s that kid, Josh, who seems spends a lot of the class drawing manga beneath his desk. His desk looks pretty chaotic and he doesn’t seem to have any pen or paper visible. Surely he can’t be too smart… but the teacher asks him a difficult question, his eyes dart around for a second, he recognises a pattern based on the equations written on the board and gives the correct answer.

Then there’s that girl, Gemma, who sits at the front attentively listening and making meticulous notes, her brow furrowed in attempts at deep thought. When she’s asked a similar question, she carefully looks through her notes for a clue, and, after a slightly awkwardly long pause, she too eventually she lands upon the answer.

You assume that Gemma is really not very smart, probably at the 50th percentile of everyone in your school, but she’s very hard-working - let’s say 90th percentile. Josh is pretty darned smart. 90th percentile? But he’s obviously not very hard-working. Probably 50th percentile. You notice that, a few other kids seem to show this same tendency; smarter kids seem less hard-working and vice-versa.

At this moment, it seems logical to see this phenomenon as a more general fact of the world. Perhaps it’s just nature’s justice: some people are naturally smart, others are naturally hard-working; it’s only fair that talents are distributed in a balanced way. Or perhaps this is actually caused by some people being intelligent: if you’re smart, you don’t need to work as hard, so you don’t bother learning productive habits (see the intelligence compensation hypothesis)! If you’re less smart, you realise that you need to really knuckle down to get good grades, so you become more hard-working.^[1]

But then it clicks. Something else explains what’s happening here…

People who are really smart and hard-working are all in Set 1! People who are a little dimmer and lazier are all in Set 3 or below. This means that you’ve basically got three kinds of people in Set 2:

People who are kind of in the right place in terms of both intelligence and hard-work
People who are Set 1-Clever but a bit too lazy to get into Set 1
People who are Set 1-Hard-working but not quite bright enough to get into Set 1

Let’s formalise this a little:

A category A is mainly determined by two factors, B and C (e.g. your academic results are determined based on your intelligence and hard work)
Category A is stratified into different bands (A1, A2 etc.), based mostly or exclusively on the factors B and C (e.g. your set is chosen based on your academic skill)
Even if B and C are positively correlated (smarter people are harder working), within each band, B and C will be less correlated, or even anti-correlated within bands (smarter people in your context might be less hard-working)

I’ll quickly illustrate this with an artificial dataset. I randomly generated 180 students, like the high school year in question, and gave them an intelligence score (like IQ, normally distributed around 100) using random variable generation. I then generated a hard_work variable correlated at around 0.35 with the intelligence variable. I gave them an “academic_result”, as a function of their intelligence and hard_work. Then I divided them into these six sets based on that score. What did we see?

This graph comes out even better than I’d hoped!

Across the whole dataset, the correlation line trends upwards, meaning that smarter people tend to be more hard-working, but we can also see that people in each set form diagonal-type bands across this distribution—within each band, the smarter you are, the less hard-working you are. Exactly the opposite effect!

Although this is artificial data, it really matches my experience. If you were in a middle set at school, you probably can put a name to that really smart kid with pathological laziness, or that kid whose world-beating conscientiousness allowed him to finally spell his name accurately in Year 9. You can probably also find them on the above graphs!

You might be noticing an anomaly in Sets 1 and 6 here. Why don’t we see the effect as strongly in the top and bottom set?

For set 1, there’s no upper limit. So if you’re in Set 1 and clearly not very hard-working, we can clearly see that you’re very smart, even without knowing your test results. If not, you’d be in a lower set. But if you’re in Set 1 and clearly very hard-working, we don’t know whether you’re also very smart or just average. The main reason to guess that you’re not exceptional is just regression to the mean (there just aren't that many people who excel across multiple domains). This means that there will be a few students who just soar above everyone else in both intelligence and hard work.

In this situation, this effect tends to decline, and the actual underlying correlation might take over. Ditto with lower limits in the lowest set.

There’s a bit more noise in the real world, but it wouldn’t surprise me if you got relatively similar results in an actual study.

This is relatively well-known, right?

Sadly for me, I’m not the first person to notice this. The underlying phenomenon is called a "Collider Bias”, and when conditioning on this collider is confusing, it’s called “Berkson’s Paradox”.

But I think this phenomenon is strangely neglected among people who think about this kind of thing. In my circles, confirmation bias and availability heuristic are fairly ubiquitous, but I've never heard of this effect “in the wild”.

One potential reason for this might be that lots of the most commonly used examples are predicting on extreme success or failure. Conditional on dying, or becoming a superstar, some traits that were correlated become slightly less correlated.

As with the top and bottom sets in the above example, conditioning on either extreme actually shrinks the effect significantly!

For example, in a sport like basketball, height and speed correlate a little bit positively among general players (faster players tend to be a bit taller), but start correlating negatively if a player gets into the NBA. This is because speed and height are both super important. So conditional on being in the NBA, if you’re under 1.90m or so, you need to be lightning fast to compete at that level. Likewise, only super-tall people can compensate for being slow.

But, if you refer back to my earlier graph, you can see why choosing the NBA downplays the impact of this phenomenon. There are no limiting factors keeping the super-fast, crazy tall players out of this sample. I can’t find data on this, but the effect would surely be stronger if we only included players in the second tier of US basketball.

Towards a theory of everything?

Okay, so this is where I get a little wild-eyed and conspiratorial. Since learning the language for this (e.g. The Book Of Why), I’ve been seeing this fairly marginal statistical phenomenon everywhere.

I’m going to suggest that this phenomenon explains way more than we generally think and can be applied incredibly broadly.

Unfortunately, most of the world is less clearly delineated and noisier than a high school physics set. But I still think that a lot of the important variation of almost anything we do in the world can be distilled in such a way that this effect becomes interesting or useful.

This is because almost everyone in the world is in some kind of intermediate state! There are a few factors that allow a person or organisation to enter spaces at a given level, but also stop them from entering “higher-level” spaces, leading to different correlations than the broader population.

I’ll give a few examples where I’ve noticed it:

Job market: Similar to the school example, getting into certain work positions is a function of capability and effort. A middle-manager is where they are for a reason. If they were more capable and put more effort in, they would be in a higher-paid or better role. If they were less capable and put in less effort, they would be in a worse role. So in fact, while more capable people in society in general also put more effort in, in any given role or position, the effect might be reversed. (Caveat: This might be complicated by personality, age, social skills and connections etc. But I suspect it holds in a lot of jobs.)

Dating Preference. The classic idea here is looks vs. niceness. Conditional on someone being “in your dating pool”, they have to be within a certain band of attractiveness, and a certain band of “niceness”. If someone is 10/10 nice and 10/10 attractive, she would be “out of your league”; if she’s below 3/10 on both, you would be out of hers. (Caveat: This might be more about “wealth/status” in some people’s dating games, and personal taste plays a strong role in both looks and personality. Also, personality might be harder to screen for on first dates.)

Effective Altruists: As my “in-group”, I think a lot about the taxonomy of the EA movement, and generally think of the key determinants as ethically motivated and rationality-minded. It’s not perfect, but I find that, conditional on attending an EA event (say, an EAGx event), people are far above the general population on both of these traits. But it often seems that, conditional on being at EA events, the more ethically minded someone seems, the less rationally minded they are. (Caveat: I’m a bit less confident in this one - these traits might be anticorrelated for other reasons.)

Your entire social universe: This is a less strong effect, but it’s probably the most influential, and there’s actually already a Less Wrong post on this! Think about the people living in your neighbourhood or town, your company, your social group, your language class, your rock band, your online forum. All of these are selected from a few criteria, which might be correlated in broader society but anticorrelated in your circles.

Your smartest friend probably has low executive function, ADHD, etc., because if he was highly motivated then he'd become a zillionaire and ascend to a higher social plane. LocalDeity

I'll be honest, my social circles seem a result of very random and arbitrary life choices, so I don't notice this effect so much. I have “stickier” friendships with people from different education and wealth tiers, cross-cultural friends, and more random, hobby-based friendships. But I know that some people seem to filter really strongly on status in their social circles. My theory would be that the more narrow, competitive or selective your social circles, the stronger this effect would be. I'm curious whether this might lead to higher susceptibility to this bias.

Does this explain anything?

This phenomenon often explains a bunch of bad folk wisdom.

There’s an “every smart person I know is failing” meme, even when intelligence is correlated with career success. If you notice too many smart failures, that might be because smart, successful people have left your social orbit. This is not to say that they’re wrong about their own circles. It’s interesting that certain traits correlate differently in different contexts!

But as well as conventionally “higher-status” people leaving your network, you might also get this effect with “lower-status” people entering your network. I heard the other day from a relatively elite person that people from a working-class background are better able to communicate complex ideas more clearly. This may be true in some interesting ways (they might be less likely to use language that confuses their audience), but a working-class person would need to be exceptional in some ways to enter this person’s network! The majority of working-class people who struggle to communicate complex ideas may well just be invisible to this person.

There also seems to be some folk wisdom around dating here. Why are handsome men such jerks?

The elite exceptions

I asked at the top: Do you ever find yourself wondering why the smartest people around you seem to be less hard-working? Or why the more hard-working people don’t seem to be super smart?

Maybe you said: No, in my social circles, smarter and hard-working people are just more impressive across all aspects!

One reason you might notice this is “range expansion”. People can find loopholes to escape from their allotted fate and find ways around their narrow band that don’t depend on two fixed criteria like intelligence and effort. As Scott Alexander talks about here, in the UK and US, there are a bunch of financial incentives, sports scholarships and positive discrimination policies that can get you in a higher (or lower, if you want funding) band of University. There are also different policies for international students or mature students, which mean that people “out of distribution” end up going to the same schools or universities.

This is very difficult in some systems, though. In the Chinese university system, which picks people from incredibly narrow bands across the country, therefore everyone arrives at their university conditional on having basically the same gaokao grades (with some provincial exceptions). I asked my wife who came from this system, and she really noticed a trade-off between competencies, which seemed far stronger than my own experience.

The other reason goes back to the “upper limit” point. If you’re at Harvard, MIT or Oxford, there’s only a lower limit, so if you’re “just smart and hard-working enough for Oxford”, you’re going to meet people who are both significantly more intelligent and hard-working than you.

This is probably the same at the “peak” of the dating market, the very top jobs, the best sportspeople, and the most EA EAs.

The bottom-end exceptions

And now we return to the mystery at the start of the essay.

Really bad things often correlate, and traits of people at the bottom of society will sadly be very correlated across domains. Poor mental health, unfortunate upbringing, addiction issues, low cognitive function, financial instability. Mechanisms are mixed here: some are causal, some caused by a third factor. So do we see the collider effect here? Let's finish on prisons.

If you’re in the prison population, firstly, welcome! I’m thrilled that my post is so far reaching!

But second, I suspect you might have experienced something similar. If you're in a maximum security prison, it’s very plausible that the prisoners are incredibly bad across multiple aspects. If you find out that your roommate is a serial killer, you should probably still be very worried, because there’s no mechanism that means that he can't be both a psychopath and someone who’s committed a serious crime!

But a medium-security prison is likely to have at least some people who: a) are very mentally/behaviourally unstable, and b) have been convicted for very severe crimes. Very unstable serial killers will be in a higher security prison, and stable tax evaders will be in a lower security facility. But a very stable and behaviourally normal serial killer, or an uncontrollably aggressive tax evader might be sent to a medium-security facility.

So conditional on them being in a medium-security prison, if your cellmate tells you they’re a murderer, you should actually be relieved!

But if they tell you they’re there for an unpaid parking fine…

Then you should really start to panic.

^{^}
Quick caveat here: this might be partly true; to quote a recent Spencer Greenberg study:
“In a big study we ran, we found that mostly there wasn't much link [between IQ and conscientiousness], but there were some (small) negative correlations between I.Q. and some aspects of conscientiousness. In particular, higher I.Q. people were a bit *less* likely to agree that they "begin tasks right away" (r=-0.14), "are a workaholic" (r=-0.12), "don't stop until everything is perfect" (r=-0.16), and think that "laws should be strictly enforced" (r=-0.11).”
As with every good “theory of everything”, I think it’s possible that this result could be a result of collider bias! Are the high-IQ workaholic perfectionists going to be responding to Clearer Thinking surveys?

[-]Karl Krueger5mo140

This puts me in mind of the old line, "What's a {guy, girl, etc.} like you doing in a place like this?" which has the unfortunate implication "you are superficially attractive; so there must be something non-obvious wrong with you, or you would be somewhere better than here."

[-]A1987dM5mo51

Another example of the bottom-end exception: if you book a hotel or B&B at the last possible week, it's possible that it will both be expensive and suck, because both the mediocre but reasonably-priced ones and the pricey but decent ones will have been fully booked for weeks.

[-]Lucie Philippon5mo40

Thank you for the post! I've regularly pointed out the spurious negative correlations from stratification in conversations, but never had a link to point to for an explanation.

Case in point, my smartest close friend is also the least hard-working. Sometime I worry he'll find his way towards reliable executive function and leave me behind :')

LESSWRONG
LW