Recently a controversy broke out over the replicability of a study John Bargh et al. published in 1996. The study reported that unconsciously priming a stereotype of elderly people caused subjects to walk more slowly. A recent replication attempt by Stephane Doyen et al., published in PLoS ONE, was unable to reproduce the results. (Less publicized, but surely relevant, is another non-replication by Hal Pashler et al.) (source)


This is interesting, if only because the study in question is one of the more famous examples of priming effects - it's the one I tend to use when I introduce people to the idea of priming. (Ironically, the failed replication study also mentions a further experimental manipulation that does show priming effects - affecting the experimenters rather than the subjects.) Bargh's reply is also unusual in that it focuses significantly on extra-scientific arguments, such as attacks on the open access business model of PLoS ONE.

I was instantly reminded of The Golem, which "debunks the view that scientific knowledge is a straightforward outcome of competent theorization, observation, and experimentation". The examples on relativity and solar neutrinos are particularly engaging - it's not just psychology where experimentation is problematic, but all of science.

The linked blog also contributes useful observations of its own, such as the "rhetorical function" of the additional experiment in Doyen's study, how online publication makes a difference in how easily experimental setups can be replicated, or a subtle point about our favorite villain, p-values.


To add some more context: a lot of priming research uses boring measures, often response time on a computer button pressing task. Most of the theory about priming effects is based on studies using these kinds of computer tasks, since they're easier to run, easier for the experimenter to control, and less stastically noisy (once you average over a hundred trials for each participant). Several studies have found that elderly priming leads to slower response times on computer tasks (e.g., Dijksterhuis, Spears, & L├ępinasse, 2001; Kawakami, Young, & Dovidio, 2002).

Then there are some studies which measure more interesting macro behavior like walking. Sometimes these studies are the first to demonstrate a novel theoretical point, but their main advantages are that they show that priming effects translate into meaningful behaviors and that they're an interesting hook for understanding & explaining the ideas (a lot fewer people would talk about or remember that elderly priming slows you down if the only studies on the topic showed that it slowed response latencies on a lexical decision task by a tenth of a second). But if one study showing a macro behavior effect turns out to have been statistical noise & publication bias (or some other experimental artifact), that doesn't change much about our best theories of priming.

Looking at the walking speed measure for elderly priming, on the one side we have the non-replications by Doyen and Pasher. On the other side we have Bargh & colleagues' (1996) original study (study 2a), and an exact replication which Bargh & colleagues published in the same article (study 2b). That is the only paper I know of which includes a direct replication of one of its studies, and it weakens the standard concerns about publication bias, post hoc methodological decisions, and so forth.

There is also a study by Cesario, Plaks, and Higgins (2006) which found that people subliminally primed with photos of elderly men walked slower than people primed with photos of teenage males. The control group in that study was in between but not significantly different from either group (n=61 for the study), which makes it mixed support for the exact Bargh study. (The Cesario study also found that the effect of priming was moderated by people's attitudes towards the elderly - people who dislike the elderly did not slow down - which led to an argument that priming works differently than Bargh et al claimed. That's getting pretty far afield, but it could be relevant to the Bargh-Doyen debate if Belgians have different attitudes towards the elderly than Americans.)

I think it is going to pretty wonderful if this study is not reproducible, to serve as cautionary example.

While Bargh's reply does mention conflict of interest, it also mentions significant methodological differences..

It does look that both replications didn't replicate the relatively small number of primes in the priming material. Bargh has a consistent explanation with links to other studies why it is important.