Comments on Power Law Distribution of Individual Impact

[-]Ben Pace8y120

Further thoughts, after discussion with Oli Habryka, on a model of an individual's expected future impact:

1st order factor

Past experience of building substantial and valuable products

If someone has already done lots of the thing you're measuring, then this is the best evidence for future success at it too

2nd order factor

This is super powerful due to the positive manifold in psychometrics, where all variables of competence correlate positively.
However, my current community which is selected fairly strongly on this - all like 2 s.d's above average, STEM students, etc, and because the tails come apart [EDIT: also known as regressional goodheart], this only captures like 25% of the variance rather than the global ~70%. So it's not vastly more important than some of the 3rd order factors.

3rd order factors

Conscientiousness (on Big Five)
Contrarian-ness

i.e. ability to not follow local incentives toward social conformity

4th order factor

Openness (on Big Five)

The two 3rd-order factors are interesting because they seem to anti-correlate. Conscientiousness often looks like 'do you follow orders' and contrarian-ness... looks like the opposite. But getting both is awesome - it's the standard Thiel-recommendation of finding someone who is great at seemingly contradictory things.

Here are the four heuristics I mentioned in the post, and which factors they measure:

Does the person have long (>1 minute) silent pauses for thinking in their conversations?

3rd order factors: Contrarian-ness, and to a lesser extent, conscientiousness

Have they exectued long-term plans not incentivised by local environment?

1st order: Past experience of building substantial and valuable products
3rd order: contrarian-ness and conscientiousness

Contrarian beliefs form simple, communicable, predictive models with a few moving parts

2nd order: IQ
3rd order: contrarian-ness

Finding insights in those they disagree with

2nd order: IQ
4th order: Openness

[-]Bird Concept8y10

Have you considered/do you know more about RQ?

"Professor Stanovich and colleagues had large samples of subjects (usually several hundred) complete judgment tests like the Linda problem, as well as an I.Q. test. The major finding was that irrationality — or what Professor Stanovich called “dysrationalia” — correlates relatively weakly with I.Q.

[...]

Based on this evidence, Professor Stanovich and colleagues have introduced the concept of the rationality quotient, or R.Q. If an I.Q. test measures something like raw intellectual horsepower (abstract reasoning and verbal ability), a test of R.Q. would measure the propensity for reflective thought — stepping back from your own thinking and correcting its faulty tendencies.

There is also now evidence that rationality, unlike intelligence, can be improved through training. [...]"

https://www.nytimes.com/2016/09/18/opinion/sunday/the-difference-between-rationality-and-intelligence.html

[-]Ben Pace8y70

Actually, I think this claim is wrong:

The major finding was that irrationality — or what Professor Stanovich called “dysrationalia” — correlates relatively weakly with I.Q.

RQ is predicted pretty well by IQ, correlating at 0.695 (according to Stuart Richie's book review that I read), and it seems plausible that the rest of the variance is noise. IQ correlates positively with all important factors, and often heavily (google the 'positive manifold' for more info), which is why I put it so high on my list.

I conjecture that the reason why Stanovich's research isn't very useful, is that he tried to find some factor that was as broadly applicable to the population as IQ is. However, his assumption that IQ is missing something massive was just wrong, and so he just ended up with another measure of IQ. What would've been more useful would've been to try to find some factor that predicts success after conditioning on IQ - for example, Tetlock's work is about figuring out how the very best people think differently than everyone else, and so his work comes out with great insights about forecasting, bayesianism and model-building.

Added: I used to be a big fan of Stanovich's work, but when I discovered that RQ correlated with IQ at 0.7... well, that's what caused me to realise that in fact IQ is a super great predictor of important cognitive properties. And then I read the history of IQ research, which is essentially people trying to prove as hard as they can that there are important metrics of success that don't correlate with IQ, and then failing to do so.

[-]Bird Concept8y50

Hm this is an update... I'll have to think more about it. (The "added" section actually provided most of the force (~75%) behind my update. It's great that you provided causal reasons for your beliefs.)

[-]Ben Pace8y20

I appreciate the feedback! Very useful to know that sort of thing.

[-]habryka8y70

(Aside: Although tricky to put human ability on a cardinal scale, normal-distribution properties for things like working memory suggest cognitive ability (however cashed out) isn't power law distributed.

Almost all scales in psychometrics are normalized, and the ones that are not normalized usually show very lopsided distributions. An interesting illustration here is the original Stanford-Binet IQ test scale, which just gave children a set of questions, and then divided the resulting score by the average for children of that age (and then multiplied it by 100), and which had very wide distributions with the 90th percentile of scores or so being a factor of 15 apart.

I don't know which working memory scale Greg is referring to here, but I would be quite surprised if that scale isn't manually normalized, and would expect various forms of working memory measures to vary drastically between different people. As an example, the digit span distribution in this paper is clearly log-normally distributed (or some similar distribution), but definitely not normally distributed:

https://www.researchgate.net/figure/Figure-Distribution-of-digit-numbers-in-the-backward-digit-span-test_7664779

[-]Thrasymachus8y50

I'm aware of normalisation, hence I chose things which have some sort of 'natural cardinal scale' (i.e. 'how many Raven's do you get right' doesn't really work, but 'how many things can you keep in mind at once' is better, albeit imperfect).

Not all skew entails a log-normal (or some similar - assumedly heavy tailed) distribution. This applies to your graph for digit span you cite here. The mean of the data is around 5, and the SD is around 2. Having ~11% at +1SD (7) and about 3% at +2SD (9) is a lot closer to normal distribution land (or, given this is count data, a pretty well-behaved poisson/slightly overdispersed binomial) than a hypothetical log normal. Given log normality, one should expect a dramatically higher maximum score when you increase the sample size from 78 in the cited study to 2400 or so. Yet in the standardization sample of the WAIS III of this size no individual had greater than 9 in forward digit span (and no one higher than 8 in reverse). (This is, I assume, the foundation for the famous '7 plus or minus 2' claim.)

http://www.sciencedirect.com/science/article/pii/S0887617701001767#TBL2

A lot turns on 'vary dramatically', but I think on most commonsense uses of this would not be it. I'd take reaction time data to be similar - although there is a 'long tail', this is a long tail of worse performance - and the tail isn't that long. So I don't buy claims I occasionally see made along the lines of 'Einstein was just miles smarter than a merely average physicist'.

[-]habryka8y50

Huh, I notice that I am confused about noone in the sample having a larger digit span than 9. Do we know whether they didn't just stop measuring after 9?

[-]habryka8y60

This random blogpost suggests that they stop at 9: https://pumpkinperson.com/2015/11/19/the-iq-of-daniel-seligman-part-5-digit-span-subtest/

[-]Thrasymachus8y30

I was unaware of the range restriction, which could well compress SD. That said, if you take the '9' scorers as '9 or more', then you get something like this (using 20-25)

Mean value is around 7 (6.8), 7% get 9 or more, suggesting 9 is at or around +1.5SD assuming normality, so when you get a sample size in the thousands, you should start seeing scores at 11 or so (+3SD) - I wouldn't be startled to find Ben has this level of ability. But scores at (say) 15 or higher (+6SD) should only be seen with extraordinarily rarely.

If you use log-normal assumptions, you should expect something like if +1.5SD is 2, 3SD is around 6 (i.e. ~13), and 4.5SD would give scores at 21 or so.

An unfortunate challenge at picking at the tails here is one can train digit span - memory athletes drill this and I understand the record lies in the three figures.

Perhaps a natural test would be getting very smart but training naive people (IMOers?) to try this. If they're consistently scoring 15+, this is hard to reconcile with normalish assumptions (digit span wouldn't correlate perfectly with mathematical ability, so lots of 6 sigma+ results look weird), and vice versa.

[-]Ben Pace8y40

Quick sanity check:

4.5SD = roughly 1 in 300,000 (according to wikipedia)

UK population = roughly 50 million

So there'd be 50 * 3 = 150 people in the UK who should be able to get scores at ~21 or more. Which seems quite plausible to me.

Also I know a few IMO people, I bet we could test this.

[-]habryka8y10

I would be happy to take a bet that took a random sample of people that we knew (let's say 10) and saw whether their responses fit more with a log-normal or a normal distribution, though I do guess this would be quite indiscriminate, since we are looking for divergence in the tails.

[-]habryka8y10

I would take a bet that if there were a hypothetical dataset that would extend further, that the maximum among 2400 participants would at least be 12.

[-]Ben Pace8y110

Oli just gave me the test as described on wikipedia, and I got all the way up to 11. According to Greg's world model, I'm in at least the 0.05th percentile (better than 2,400 random students), but given a normal distribution that expects 0 at 10 with a sample of 2,400, I must be way higher than that. (If anyone can do the maths, would be appreciated, I'd guess I'm like more than 1 in a million tho. According to Greg's world-model.)

Added: Extra info, I started visualising the first 6 digits (in 2 groups of 3) and remembering the rest in my audio memory.

[-]Thrasymachus8y20

This new paper may be of relevance (H/T Steve Hsu). The abstract:

The largely dominant meritocratic paradigm of highly competitive Western cultures is rooted on the belief that success is due mainly, if not exclusively, to personal qualities such as talent, intelligence, skills, efforts or risk taking. Sometimes, we are willing to admit that a certain degree of luck could also play a role in achieving significant material success. But, as a matter of fact, it is rather common to underestimate the importance of external forces in individual successful stories. It is very well known that intelligence or talent exhibit a Gaussian distribution among the population, whereas the distribution of wealth - considered a proxy of success - follows typically a power law (Pareto law). Such a discrepancy between a Normal distribution of inputs, with a typical scale, and the scale invariant distribution of outputs, suggests that some hidden ingredient is at work behind the scenes. In this paper, with the help of a very simple agent-based model, we suggest that such an ingredient is just randomness. In particular, we show that, if it is true that some degree of talent is necessary to be successful in life, almost never the most talented people reach the highest peaks of success, being overtaken by mediocre but sensibly luckier individuals. As to our knowledge, this counterintuitive result - although implicitly suggested between the lines in a vast literature - is quantified here for the first time. It sheds new light on the effectiveness of assessing merit on the basis of the reached level of success and underlines the risks of distributing excessive honors or resources to people who, at the end of the day, could have been simply luckier than others. With the help of this model, several policy hypotheses are also addressed and compared to show the most efficient strategies for public funding of research in order to improve meritocracy, diversity and innovation.

[-]habryka8y40

Huh, I am surprised that this got published. The model proposed seems almost completely equivalent to the O-ring paper that has a ton of literature on it, that had roughly the same results. And it doesn’t have any empirical backing, so that’s even more confusing. I mean, it‘s a decent illustration, but it does really seem to not be saying anything new in this space.

They also weirdly overstate their point. The correlation between luck and talent heavily depends on the number of iterations and initial distribution parameters their model assumes, and they seem to just have arbitrarily fixed them for their abstract, and later in the paper they basically say “if you change these parameters, the correlation of talent with success goes up drastically, and the resulting distribution still fits the data”. I.e. the only interesting thing that they’ve shown is that if you have repeated trials with probabilities drawn from a normal distribution, you get a heavy-tailed distribution, which is a trivial statistical fact addressed in hundreds of papers.

[-]Zvi8y100

I am surprised that you are surprised that this got published. It reinforces and claims to provide proof towards the worldviews currently ascendant in academia, strengthening politically convenient claims and weakening inconvenient ones. Overstatement of the result also seems par for the course. That doesn't make it useful, or anything, but it all seems very unsurprising.

[-]habryka8y110

Yeah, I was just thinking about me saying that while I was standing in the shower. I actually planned to remove the "I am surprised that this got published" line, because I wasn't actually surprised. I think implicitly I probably just wanted to reduce the status of the associated paper, and question its legitimacy, and it seems that the cached phrase I currently have for that is "I am surprised this got published", which really doesn't seem like the ideal phrase for that, but does seem pretty commonly used for precisely that purpose.

[-]Ben Pace8y40

Link to the O-ring paper?

[-]habryka8y50

Wikipedia: https://www.wikiwand.com/en/O-ringtheory of_economic_development

Original: https://www.jstor.org/stable/2118400

[-]Bird Concept8y10

Really great discussion here, on an important and action-guiding question.

I'm confused about some of the discussion of predicting impact.

If we're dealing with a power-law, then most of the variance in impact comes from a handful of samples. So if you're using a metric like "contrarianness+conscientiuosness" that corresponds to an exceedingly rare trait, it might look like you're predictions are awful, because thousands of CEOs and career executives who are succesful by common standards lack that trait. However, as long as you get Musk and a handful others right, you will have correctly predicted most of the impact, despite missing most of the succesful people. What matters is not how many data-points you get right, but which ones.

Similarly, were it the case that one or two tail-end individuals (like Warren Buffett) score within 2 standard deviations on IQ, that would make IQ a substantially worse metric for predicting who will have the most impact. I haven't found any such individual, but I think doing so suffices to discredit some of the psychometric study conclusions as long as they didn't include that particular individual (which they likely didn't).

[-]Ben Pace8y30

I haven't found any such individual, but I think doing so suffices to discredit some of the psychometric study conclusions as long as they didn't include that particular individual

Well, only if the individual would falsify the study. My claim is that the folks at the end of the power law will have these properties. I think of it as a filtering mechanism: first you filter by the first order factors, then the second order, and so on, each one doing less work than the last (for example, filtering by >2 SD IQ will cut you to <5% of the population, but once you're just down to the best 0.01% then the third order factors will help you pick out the peak, even though those factors wouldn't have cut down the world very much to start with).

LESSWRONG
LW

LESSWRONG
LW

22

Comments on Power Law Distribution of Individual Impact

22

22