She Blinded Me With Science


Scrutinize claims of scientific fact in support of opinion journalism.

Even with honest intent, it's difficult to apply science correctly, and it's rare that dishonest uses are punished. Citing a scientific result gives an easy patina of authority, which is rarely scratched by a casual reader. Without actually lying, the arguer may select from dozens of studies only the few with the strongest effect in their favor, when the overall body of evidence may point at no effect or even in the opposite direction. The reader only sees "statistically significant evidence for X". In some fields, the majority of published studies claim unjustified significance in order to gain publication, inciting these abuses.

Here are two recent examples:

Women are often better communicators because their brains are more networked for language. The majority of women are better at "mind-reading," than most men; they can read the emotions written on people's faces more quickly and easily, a talent jump-started by the vast swaths of neural real estate dedicated to processing emotions in the female brain.

- Susan Pinker, a psychologist, in NYT's "DO Women Make Better Bosses"

Twin studies and adoptive studies show that the overwhelming determinant of your weight is not your willpower; it's your genes. The heritability of weight is between .75 and .85. The heritability of height is between .9 and .95. And the older you are, the more heritable weight is.

- Megan McArdle, linked from the LW article The Obesity Myth

Mike, a biologist, gives an exasperated explanation of what heritability actually means:

Quantitative geneticists use [heritability] to calculate the changes to be expected from artificial or natural selection in a statistically steady environment. It says nothing about how much the over-all level of the trait is under genetic control, and it says nothing about how much the trait can change under environmental interventions.

Susan Pinker's female-boss-brain cheerleading is refuted by Gabriel Arana. A specific scientific claim Pinker makes ("the thicker corpus callosum connecting women's two hemispheres provides a swifter superhighway for processing social messages") is contradicted by a meta-analysis (Sex Differences in the Human Corpus Callosum: Myth or Reality?), and without that, you have only just-so evolutionary psychology argument.

The Bishop and Wahlsten meta-analysis claims that the only consistent finding is for slightly larger average whole brain size and a very slightly larger corpus callosum in adult males. Here are some highlights:

Given that the CC interconnects so many functionally different regions of cerebral cortex, there is no reason to believe that a small difference in overall CC size will pertain to any specific psychological construct. Total absence of the corpus callosum tends to be associated with a ten-point or greater reduction in full-scale IQ, but more specific functional differences from IQ-matched controls are difficult to identify.
In one recent study, a modest correlation between cerebrum size and IQ within a sex was detected. At the same time, males and females differ substantially in brain size but not IQ. There could easily be some third factor or array of processes that acts to increase both brain size and IQ score for people of the same sex, even though brain size per se does not mediate the effect of the other factor on IQ.
The journal Science has refused to publish failures to replicate the 1982 claims of de Lacoste-Utamsing and Holloway (Byne, personal communication).

Obviously, if journals won't publish negative results, then this weakens the effective statistical significance of the positive results we do read. The authors don't find this to be significant for the topic (the above complaint isn't typical).

When many small-scale studies of small effects are published, the chances are good that a few will report a statistically significant sex difference. ... One of our local newspapers has indeed printed claims promulgated over wire services about new studies finding a sex difference in the corpus callosum but has yet to print a word about contrary findings which, as we have shown, far outnumber the statistically significant differences.

This effect is especially notable in media coverage of health and diet research.

The gold-standard in the medical literature is a cumulative meta-analysis conducted using the raw data. We urge investigators to make their raw data or, better yet, the actual tracings available for cumulative meta-analysis. We attempted to collect the raw data from studies of sex differences in the CC cited in an earlier version of this paper by writing to the authors. The level of response was astoundingly poor. In several studies that used MRI, the authors even stated that the original observations were no longer available.

This is disturbing. I suspect that many authors are hesitant to subject themselves to the sort of scrutiny they ought to welcome.

By convention, we are taught that the null hypothesis of no sex difference should be rejected if the probability of erroneously rejecting the null on the basis of a set of data is 5% or less. If 10 independent measures are analysed in one study, each with the α = 0.05 criterion, the probability of finding at least one ‘significant’ sex difference by chance alone is 1 − (1 − 0.05)10 = 0.40 or 40%. Consequently, when J tests involving the same object, e.g. the corpus callosum, are done in one study, the criterion for significance of each test might better be adjusted to α/J, the Dunn or Bonferroni criterion that is described in many textbooks. All but two of 49 studies of the CC adopted α = 0.05 or even 0.10, and for 45 of these studies, an average of 10.2 measures were assessed with independent tests.

This is either rank incompetence, or even worse, the temptation to get some positive result out of the costly data collection.