My primary objection is: perhaps some of the students in both groups got smarter (these are 8-9 year olds and still developing) for reasons independent of the interventions, which caused them to improve on the n-back training task AND on the other intelligence tests (fluid intelligence, Gf). If you separated the "active control" group into high and low improvers post-hoc just like was done for the n-back group, you might see that the active control "high improvers" are even smarter than the n-back "high improvers". We should expect some 8-9 year olds to improve in intelligence or motivation over the course of a month or two, without any intervention.

Basically, this result sucks, because of the artificial post-hoc division into high- and low- responders to n-back training, needed to show a strong "effect". I'm not certain that the effect is artificial; I'd have to spend a lot of time doing some kind of sampling to show how well the data is explained by my alternative hypothesis.

It's definitely legitimate to look at the whole n-back group vs. the whole active control group. Those results there aren't impressive at all. I just can't give any credit for the post-hoc division because I don't know how to properly penalize it and it's clearly self-serving for Jaeggi. It's borderline deceptive that the graphs don't show the unsplit n-back population.

It's unsurprising (probably offering no evidence against my explanation) that the initial average n-back score for the low improvers is higher than the initial average for the high improvers; this is what you'd expect if you split a set of paired samples drawn from the same distribution with no change at all, for example.

Also, on pg 2/6, I don't understand how the t statistics line up with the group sizes.

The groups are ((16 high improvement+16 low improvement)+30 control), so why is it (15), t(15), t(30), and then later t(16)? Does t(n) not mean that it's a t statistic over a population of n? I'm guessing so. I assume the t is an unpaired student's t-test, which of course assumes the distributions compared are normal. I'm not sure if that's demonstrated, but it may be obvious to experts (it's not to me).

Disclaimer: I did dual n-back for a month or so, and got stuck at 5. I haven't resumed, though I may do so in the future.

The groups are ((16 high improvement+16 low improvement)+30 control), so why is it (15), t(15), t(30), and then later t(16)? Does t(n) not mean that it's a t statistic over a population of n?

Not usually. Numbers in brackets after a well-known statistic normally represent parameters for that statistic's distribution; in the case of a t-test the bracketed number would be the number of degrees of freedom, which might be one less than the sample size (for a one-sample t-test) or two less than the sum of sample sizes (for an equal variances two-sample t-test).

(Disclaimer: I haven't read the paper.)

[Edited for unambiguity.]

3Douglas_Knight9yYou are way too underconfident. If an intervention is equally likely to raise or lower the score with respect to the control group, without increasing variation, it does nothing. When you say that the aggregate results "aren't impressive," you imply that they are positive, but if I read table 1 correctly, the aggregate results are often negative.

N-back news: Jaeggi 2011, or, is there a psychologist/statistician in the house?

by gwern 1 min read16th Jun 201123 comments


Following up on the 2010 study, Jaeggi and University of Michigan people have run a Single N-back study on 60 or so children.

The abstract is confident and the mainstream coverage unquestioning of the basic claim. But reading it, the data did not seem very solid at all - I will forbear from describing my reservations exactly; I have been accused of being biased against n-backing, however, and I'd appreciate outside opinions, especially from people with expertise in the area.

(Background: Jaeggi 2011 in my DNB FAQ. Don't read it unless you can't render the above requested opinion, since it includes my criticisms.)