Hi Yvain,
please state a definite end date next year. Filling out the survey didn't have a really high priority for me, but knowing that I had "about a month" made me put it off. Had I known that the last possible day was the 26th of November, I probably would have fit it in sometime in between other stuff.
The calibration question is an n=1 sample on one of the two important axes (those axes being who's answering, and what question they're answering). Give a question that's harder than it looks, and people will come out overconfident on average; give a question that's easier than it looks, and they'll come out underconfident on average. Getting rid of this effect requires a pool of questions, so that it'll average out.
Yep. (Or as Yvain suggests, give a question which is likely to be answered with a bias in a particular direction.)
It's not clear what you can conclude from the fact that 17% of all people who answered a single question at 50% confidence got it right, but you can't conclude from it that if you asked one of these people a hundred binary questions and they answered "yes" at 50% confidence, that person would only get 17% right. The latter is what would deserve to be called "atrocious"; I don't believe the adjective applies to the results observed in the survey.
I'm not even sure that you can draw the conclusion "not everyone in the sample is perfectly calibrated" from these results. Well, the people who were 100% sure they were wrong, and happened to be correct, are definitely not perfectly calibrated; but I'm not sure what we can say of the rest.
I previously mentioned that item non-response might be a good measure of Conscientiousness. Before doing anything fancy with non-response, I first checked that there was a correlation with the questionnaire reports. The correlation is zero:
R> lwc <- subset(lw, !is.na(as.integer(as.character(BigFiveC))))
R> missing_answers <- apply(lwc, 1, function(x) sum(sapply(x, function(y) is.na(y) || as.character(y)==" ")))
R> cor.test(as.integer(as.character(lwc$BigFiveC)), missing_answers)
Pearson's product-moment correlation
data: as.integer(as.character(lwc$BigFiveC)) and missing_answers
t = -0.0061, df = 421, p-value = 0.9952
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.09564 0.09505
sample estimates:
cor
-0.0002954
# visualize to see if we made some mistake somewhere
R> plot(as.integer(as.character(lwc$BigFiveC)), missing_answers)

I am completely surprised. The results in the economics paper looked great and the rationale is very plausible. Yet... The 2 sets of data here have the right ranges, there's plenty of variation in both dimension, I'm sure I'm catching most of the item non-responses or N...
There is a correlation of 0.13 between non-responses and N.
Of course, there's also a correlation of -0.13 between C and the random number generator.
People who had seen the RNG give a large number were primed to feel unusually reckless when taking the Big 5 test. Duh. (Just kidding.)
I really have no idea what went so wrong [with the question about Bayes' birth year]
Note also that in the last two surveys the mean and median answers were approximately correct, whereas this time even the first quartile answer was too late by almost a decade. So it's not just a matter of overconfidence -- there also was a systematic error. Note that Essay Towards Solving a Problem in the Doctrine of Chances was published posthumously when Bayes would have been 62; if people estimated the year it was published and assumed that he had been approximately in his thirties (as I did), that would explain half of the systematic bias.
As Yvain says, "people have been pretty quick to ridicule this survey's intelligence numbers as completely useless and impossible and so on" because if they're true, it means that the average LessWronger is gifted. Yvain added a few questions to the 2012 survey, including the ACT and SAT questions and the Myers-Briggs personality type question that I requested (I'll explain why this is interesting), and that give us a few other things to check against, which has made the figures more believable. The ridicule may be an example of the "virtuous doubt" that Luke warns about in Overconfident Pessimism, so it makes sense to "consider the opposite":
The distribution of Myers-Briggs personality types on LessWrong replicates the Mensa pattern. This is remarkable since the patterns of personality types here are, in many significant ways, the exact opposite of what you'd find in the regular population. For instance, the introverted rationalists and idealists are each about 1% of the population. Here, they are the majority and it's the artisans and guardians who are relegated to 1% or less of our population.
Mensa's personality test results we...
Alternate possibility: The distribution of personality types in Mensa/LW relative to everyone else is an artifact produced by self-identified smart people trying to signal their intelligence by answering 'yes' to traits that sound like the traits they ought to have.
eg. I know that a number of the T/F questions are along the lines of "I use logic to make decisions (Y/N)", which is a no-brainer if you're trying to signal intelligence.
A hypothetical way to get around this would be to have your partner/family member/best friend next to you as you take the test, ready to call you out when your self-assessment diverges from your actual behaviour ("hold on, what about that time you decided not to go to the concert of [band you love] because you were angry about an unrelated thing?")
From the public dataset:
165 out of 549 responses without reported positive karma (30%) self-reported an IQ score; the average response was 138.44.
181 out of 518 responses with reported positive karma (34%) self-reported an IQ score; the average response was 138.25.
One of the curious features of the self-reports is how many of the IQs are divisible by 5. Among lurkers, we had 2 151s, 1 149, and 10 150s.
I think the average self-response is basically worthless, since it's only a third of responders and they're likely to be wildly optimistic.
So, what about the Raven's test? In total, 188 responders with positive karma (36%), and 164 responders without positive karma (30%) took the Raven's test, with averages of 126.9 and 124.4. Noteworthy is the new max and min- the highest scorer on the Raven's test claimed to get 150, and the three sub-100 scores were 3, 18, and 66 (of which I suspect only the last isn't a typo or error of some sort).
Only 121 users both self-reported IQ and took the Raven's test. The correlation between their mean-adjusted self-reported IQ and mean-adjusted Raven's test was an abysmal .2. Among posters with positive karma, the correlation was .45; among posters without positive karma, the correlation was -.11.
Ok I managed to dig it up!
E/I | S/N | T/F | J/P (Category)
----------------------------------------------
75/25 75/25 55/45 50/50 (Overall population)
27/73 10/90 75/25 65/35 (Mensans)
15/85 03/97 88/12 54/46 (LessWrongers) *
From the December 1993 Mensa Bulletin.
* The LessWrongers were added by me, using the same calculation method as in the comment where I test my personality type predictions and are based on the 2012 survey results.
This also explains a lot of things. People regard IQ as if it is meaningless, just a number, and they often get defensive when intellectual differences are acknowledged. I spent a lot of time doing research on adult giftedness (though I'm most interested in highly gifted+ adults) and, assuming the studies were done in a way that is useful (I've heard there are problems with this), and my personal experiences talking to gifted adults are halfway decent as representations of the gifted adult population, there are a plethora of differences that gifted adults have. For instance, in "You're Calling Who A Cult Leader?" Eliezer is annoyed with the fact that people assume that high praise is automatic evidence that a person has joined a cult. What he doesn't touch on is that there are very significant neurological differences between people in just about every way you could think of, including emotional excitability. People assume that others are like themselves, and this causes all manner of confusion. Eliezer is clearly gifted and intense and he probably experiences admiration with a higher level of emotional intensity than most. If the readers of LessWrong and Hacker Ne...
Eliezer is clearly gifted and intense and he probably experiences admiration with a higher level of emotional intensity than most. If the readers of LessWrong and Hacker News are gifted, same goes for many of them. To those who feel so strongly, excited praise may seem fairly normal. To all those who do not, it probably looks crazy.
Would you predict then that people who're not gifted are in general markedly less inclined to praise things with a high level of intensity?
This seems to me to be falsified by everyday experience. See fan reactions to Twilight, for a ready-to-hand example.
My hypothesis would simply be that different people experience emotional intensity as a reaction to different things. Thus, some think we are crazy and cultish, while also totally weird for getting excited about boring and dry things like math and rationality... while some of us think that certain people who are really interested in the lives of celebrities are crazy and shallow, while also totally weird for getting excited about boring and bad things like Twilight.
This also leads each group to think that the other doesn't get similar levels of emotional intensity, because only the group's own type of "emotional intensity" is classified as valid intensity and the other group's intensity is classified as madness, if it's recognized at all. I've certainly made the mistake of assuming that other people must live boring and uninteresting lives, simply because I didn't realize that they genuinely felt very strongly about the things that I considered boring. (Obligatory link.)
(Of course, I'm not denying there being variation in the "emotional intensity" trait in general, but I haven't seen anything to suggest that the median of this trait would be considerably different in gifted and non-gifted populations.)
But I am skeptical of these numbers. I hang out with some people who are very closely associated with the greater Less Wrong community, and a lot of them didn't know about the survey until I mentioned it to them in person. I know some people who could plausibly be described as focusing their lives around the community who just never took the survey for one reason or another. One lesson of this survey may be that the community is no longer limited to people who check Less Wrong very often, if at all. One friend didn't see the survey because she hangs out on the #lesswrong channel more than the main site. Another mostly just goes to meetups. So I think this represents only a small sample of people who could justly be considered Less Wrongers.
Yeah, this also fits my observations--I suspect that reading LW and hanging out with LW types in real life are substitute goods.
Some of the 'descriptions of LessWrong' can make for a great quote on the back of Yudkowsky's book.
Obnoxious self-serving, foolish trolling dehumanizing pseudointellectualism, aesthetically bankrupt.
;-)
Pratchett always includes a quote that calls him a "complete amateur," so there is some precedent for ostentatiously including negative reviews.