Low legibility of Cognitive Reflection Test dramatically improves performance?

by uzalud1 min read8th Nov 201126 comments


Personal Blog

I'm reading Kahneman's Thinking, Fast and Slow and I've stopped on this:

90% of the students who saw the CRT in normal font made at least one mistake in the test, but the proportion dropped to 35% when the font was barely legible. You read this correctly: performance was better with the bad font.

This seems like an important finding, but I can't find references in the book (Kindle) or on the Web. Does anybody know any real evidence for this claim? EDIT: I found the original paper

Do you think that people could behave rationally with such a simple intervention?

simple intro to CRT

EDIT: fixed spelling in title

26 comments, sorted by Highlighting new comments since Today at 9:21 AM
New Comment

Maybe it's this paper: http://web.princeton.edu/sites/opplab/papers/Diemand-Yauman_Oppenheimer_2010.pdf

From the abstract:

Previous research has shown that disfluency – the subjective experience of difficulty asso- ciated with cognitive operations – leads to deeper processing. Two studies explore the extent to which this deeper processing engendered by disfluency interventions can lead to improved memory performance. Study 1 found that information in hard-to-read fonts was better remembered than easier to read information in a controlled laboratory setting. Study 2 extended this finding to high school classrooms. The results suggest that superficial changes to learning materials could yield significant improvements in educational outcomes.

Thanks! I've followed references and I think I have the original paper: http://pages.stern.nyu.edu/~aalter/intuitive.pdf

We recruited 40 Princeton University undergraduate volunteers at the student campus center to complete the three-item CRT (Frederick, 2005). Participants were seated either alone or in small groups, and the experimenter ensured that they completed the questionnaire individually. Those in the fluent condition completed a version of the CRT written in easy-to-read black Myriad Web 12-point font, whereas participants in the disfluent condition completed a version of the CRT printed in difficult-to-read 10% gray italicized Myriad Web 10-point font. Participants were randomly assigned to complete either the fluent or the disfluent version of the CRT (...) As predicted, participants answered more items on the CRT correctly in the disfluent font condition ... Whereas 90% of participants in the fluent condition answered at least one question incorrectly, only 35% did so in the disfluent condition.

Could an easy way to implement this hack, at least temporarily, be to read things upside down?

More on quick sensory disruptions later...

Excellent! I'll do my exams upside down.

ETA: It was a joke - I don't really intend to do this in my future exams.

I'd recommend testing to see if it helps before trying it in your exams.

It didn't say what it did to completion times. If you reliably finish your exams with much time to spare and make small errors that you have trouble catching simply using the balance of the time to check, this may be a good approach. Otherwise, there are probably better approaches.

I wonder how far this generalizes. If it only applies to sensory disfluency it raises some interesting tradeoffs for future investigation, but if it also applies to parsing it might go some way towards explaining the fashion for deliberate obscurantism in some branches of philosophy (and other academia).

(crossposting this forum post from July, I think it has some bearing on this idea)

So I was thinking about scientists, and how they frequently write things that laypeople can't begin to parse. Do they do this to look smart? All scientists live in a crippling fear that what they are doing is not important or groundbreaking, so like a squid they generate a defensive cloud of technical terminology to obscure that fact. For a time I thought this was the whole of the explanation, when I was at my nadir of cynicism. Then I thought, well, maybe scientists just don't write very well. The things they write are unreadable not because of some plot to make them unreadable, but because it is the natural state of text to be unreadable, and there's no optimization process pulling scientific writing away from that. Then I grew kinder to scientists, and thought, maybe it's not that they can't produce readable text, but that they don't bother, because the point of scientific writing is to communicate facts to other scientists efficiently, and to hell with laypeople.

I have a new theory.

Human beings are not good at science. Their brains do not naturally work by looking at evidence and updating probabilities based on it, they pattern-match the things they see to the things they've seen before and treat them the same. The fundamental operation of the human brain is a cache lookup, and to the extent that humans have rationality it is a high-level construct built on top of it. Scientific language is an attempt to force a cache miss, a page fault, so that humans are forced to actually bring their rationality to bear instead of assuming that they already know the answers because particle physics is basically like billiards and I'm good at billiards. Take "climate change" (please). When the scientific community started talking about "climate change" instead of "global warming", people called it a cynical political move. Through this lens, though, it looks like a desperate attempt to regain scientific neutrality for a topic that has flooded everyone's cache due to widespread popular contention. Akin to the euphemism treadmill and the dysphemism treadmill, we have here the formalism treadmill.

Well, I was thinking more of the language you see in (e.g.) continental philosophy than in science and math, but that might just be reflecting my skillset: I've got a much better compatibility mode for scientific language.

That aside, though, I think there probably is a formalism treadmill in science, but I suspect it'd be more prominent in fields that intersect broadly with the public than in disciplines or branches of disciplines that mostly talk within themselves (where your "to hell with laypeople" explanation seems to suffice). We can distinguish between the two by checking the stability of language: if preferred terms change rapidly as older ones enter the lay lexicon, there's probably a need for formalism. If they don't, there probably isn't -- even if popular (mis)use of (e.g.) Heisenberg's uncertainty principle tends to drive professionals in the discipline a little crazy.

Outside of contentious popular science topics, I'd say we tend to see that sort of unstable language in psychology and to a lesser extent in medicine. Makes sense; I can think of reasons for both to find cache misses useful when dealing with the public.

"On the Pedagogical Motive for Esoteric Writing", Arthur Melzer 2007:

What evidence and what arguments can be produced in support of the controversial suggestion, first made by Leo Strauss now over 65 years ago, that most earlier philosophers wrote esoterically and, what is more, that they did so, not merely from fear of persecution, but with an eye to enhancing their pedagogical effectiveness? I argue here that the inherent paradoxes of philosophical education combined with the inherent shortcomings of writing led many earlier thinkers to see the pedagogical necessity of something like the “Socratic method.” And esoteric writing—a rhetoric of riddling concealment—is the closest literary approximation to the Socratic method.

My opinion of the general Straussian suggestion has been heightened by the recent claims of finding musical structures in Plato's dialogues, who had been one of the major proposed users of esotericism.

What is the proposed mechanism? Is it that they think harder about it or simply that they read more carefully? Test design criteria often specify a number of interventions to prevent mistaken readings (for example, using "NOT" rather than "not" or emphasizing queries in bold type after a long paragraph).

Author continues:

Cognitive strain, whatever its source, mobilizes System 2, which is more likely to reject the intuitive answer suggested by System 1.

System 1 is the impulsive, unconscious, eager but not very intelligent aspect of the mind. System 2 is slow, conscious and more thoughtful, but "lazy" and prone to accept suggestions from the System 1. Theory is that inducing cognitive strain diverts more mental resources to the System 2, which then tends to do a proper job at solving the test.

I think barely legible letters send the following message: "WTF? Someone is screwing with me, I must be more careful. I must double-check everything." Frankly, I think the only use of this test would be to make it a part of a larger test, to measure the test subject's effort level.

[-][anonymous]9y 5

Idea: give people the test in a normal font, but with "BY THE WAY WE'RE SCREWING WITH YOU" written across the bottom.

[-][anonymous]9y 0

See, I find it weird that on taking the CRT (first time), I got all the answers correct, but I also answered them all instinctively, off-the-cuff, and found that taking the time to think each one through reduced my confidence significantly -- but once I had all the answers confirmed (without explanations) it was easy to understand why that answer was correct.

The title of this post has a misspelling.

thanks, fixed

I assumed it was a demonstration of the principle you discuss ;)

I've used mirror writing in notes as a trick to pay attention in boring classes since elementary school. Once it becomes automatic, switch back to left to right writing for a few words or lines to scramble the brain. The more boring the material, the more one must switch to maintain focus. It doesn't come perfectly naturally for me, if it did it wouldn't work.

Do you think that people could behave rationally with such a simple intervention?

A higher cognitive burden on comprehension tautologically requires more thought per unit of information occur.

I don't know if it's a question of rational behavior but of necessary.

I don't think it's self-evident that effort put in recognizing letters should translate into significant improvement in problem solving. For example, it could be expected that this lower-level burden would drain cognitive "energy" from higher functions trying to solve the mathematical problem.

For example, it could be expected that this lower-level burden would drain cognitive "energy" from higher functions trying to solve the mathematical problem.

In observing how people take tests, I've seen that people first 'extract' the information from the question and then move on to deriving its answer. Your point is valid, however.

Yes, I think that theory goes that, since you "fired up" the higher-level cognitive "engine" of your mind, you might as well use it to solve the problem. Perhaps it's a sunk-cost type of thinking, where you feel that you should justify your efforts in understanding the problem by solving the problem properly. Or, perhaps the lower-level, less intelligent mind agents are not triggered by the slower process of understanding the problem.

Well -- there's also the human habit of skimming over text to extract the "useful" information -- especially in timed tests or where we believe the text extraneous to the actual function. Word problems are pretty much always an exercise in "these words are an obstacle between me and the formula". So it stands to reason -- superficially that is -- that making it harder to read the font (without increasing the difficulty of the language) would act as a "counterbalance" to the impetus to get done as quickly as possible with the verbal and on to the mathematical.

In other words; I'm asserting a hypothesis that this is illustrating an underlying mechanism regarding how test-takers handle reading their examination questions.

Is it "more rational" to spend more time on the exam question? Perhaps. (Almost definitely, since doing so increases their scores as shown here.) But then we have to ask what the actual goal of test-takers is at the time of taking the exam. Is it truly to "get the highest score"? Or is it "avoid the greatest amount of anxiety this exam produces in me {where 'me'='person taking the test'}"? Very often I have found the latter to be the case -- but I would suspect that this is hardly irrational; those individuals frequently aren't much invested in higher exam scores than are necessary to achieve a passing score. Getting the exam done quicker without falling below that score, then, is the rationally optimum resolution.

Hence my doubt as to whether it would be called a "rational" behavior over a "necessary" one.

I guess I'm confused about your use of the word "necessary".

But you're right. What is the motivation of the test-taker? How much are they trying to get the answers right and how much they want to "just get it over with"? At least part of the cognitive system is lazy/avoidant, but it doesn't seem that test-takers consciously think "I'll just write down the first answer that comes to mind".

But the real question is this: when they read the smaller text, do they feel less anxiety? Probably not. Then, maybe solving the problem requires less effort once you have spent more time at reading the question. But take a look at the CRT: to me, it seems that problems are clear any way you read them.

At least part of the cognitive system is lazy/avoidant, but it doesn't seem that test-takers consciously think "I'll just write down the first answer that comes to mind".

True, but it does tend -- if true -- to imply that the test-taker would have a desire of "minimize all extraneous functions".

Then, maybe solving the problem requires less effort once you have spent more time at reading the question.

My hypothesis here is that by making the font harder to read it's causing the test-takers to invest more cognition into properly reading the questions than their drive to "optimize" that labor to its bare minimum would otherwise imply.