I appreciate an attempt to identify a new construct, but the thinking here is still too amorphous to benefit much from peer review.
I think the name for this construct needs help. "[Statistical literacy] is not about knowledge of statistical tools and techniques," so why not call it something else? Contrast with literacy, graph literacy, numeracy, or lay rationalism. The construct definition is also missing:
This captures many of the things I think are covered by statistical literacy:
- It is not...
- It emphasises..
- It suggests...
- It tells...
Given that what the construct is hasn't even been clearly stated, any propositions involving it have their wind taken out of their sails, but equally, any attempted critiques are just tilting at windmills.
All I can say is that whatever construct you're working on, it might align with whatever Dan Luu does, which he doesn't describe in any detail.
Thanks for the honest feedback! It is probably too early in my hobby research to share this, yes. My main hope is that it would resonate with someone else who might be more clear on what it is, and maybe even inspire some sort of measurement.
I am convinced there exists something we can call statistical literacy. Unfortunately, I don’t yet know exactly what it is, so it is hard to write about.
One thing is clear: it is not about knowledge of statistical tools and techniques. Most of the statistically literate people I meet don’t know a lick of formal statistics. They just picked up statistical literacy from … somewhere. They don’t know the definition of a standard deviation, but they can follow a statistical argument just fine.
The opposite is also possible: a few years ago I had a formidable toolbox of statistical computations I was able to do, but I would be very confused by a basic statistical argument outside the narrow region of techniques I had learned.
In other words, it is not about calculations. I think it is about an intuitive sense for process variation, and how sources of variation compare to each other.
Please excuse my ignorance
Content warning: this is the most arrogant article I’ve written in a long time. I ask you to bear with me, because I think it is an important observation to discuss. Unfortunately, I lack the clarity of mind to make it more approachable: the article is arrogant because I am dumb, not because of the subject matter itself.
Hopefully, someone else can run with this and do a better job than I.
Shipping insurance before and after statistics
It’s hard to write directly about something which one don’t know what it is, so we will proceed by analogy and example.
Back in the 1500s shipping insurance was priced under the assumption that if you just knew enough about the voyage, you could tell for certain whether it would be successful or not, barring the will of God. Thus, when asked to insure a shipment, the underwriter would thoroughly investigate things like the captain’s experience, ship maintenance status, size of crew rations, recency of navigational charts, etc. After much research, they would conclude either that the shipment ought to be successful, or that it ought not to be. They arrived at a logical, binary conclusion: either the shipment will make it (based on all we know) or it will not. Then they quoted a price based on whether or not the shipment would make it.
This type of logical reasoning leads to a normative perspective of what the future ought to look like. Combined with the idea that every case is unique, this is typical of a lack of statistical illiteracy. The statistically illiterate predicts what the future will look like based on detailed knowledge and logical sequences of events. Given that we hadn’t yet invented statistics in the 1500s, it is not surprising our insurer would think that way.
Of course, even underwriters at the time knew that sometimes ships that ought to make it run into a surprise storm and sink. Similarly, ships that ought not to make it are sometimes lucky and arrive safely. To the 1500s insurer, these are expressions of the will of God, and are incalculable annoyances, rather than factors to consider when pricing.
This is similar to how a gambler in the 1500s could tell you that dice were designed to land on each number equally often – but would refuse to give you a probability for the next throw, because the outcome of any given throw is “not uncertain, just unknown”: God has predetermined a specific number for each throw, and we have no way of knowing how God makes that selection. This distinction between the uncertain and unknown still happens among the statistically illiterate today.
The revolution in mindset that happened in the 1600s and 1700s was that one could ignore most of what made a shipment unique and instead price the insurance based on what a primitive reference class of shipments had in common, inferring general success propensities from that. Insurers that did this outprofited those that did not, in part because they were able to set a more accurate price on the insurance, and in part because they spent less on investigating each individual voyage.
Two changes in the spirit of men
I like the mid-1800s quote from Lecky commenting on the rise of rationalism, saying
I believe we are now in the early days of a similar movement, namely the rise of empiricism. Borrowing Lecky’s words, we could use almost the same passage to describe this change.
This captures many of the things I think are covered by statistical literacy:
If my idea of statistical literacy is accurate, my readership should fall roughly into three categories in their reactions to the above:
The first category (“Yes!”) will consist of people who are statistically literate. The third category (“No!”) will attract people who are not statistically literate. I don’t know about the middle ground – I think it could attract open-minded but not yet very statistically literate people.
Statistical literacy as a developmental milestone
The devious thing about statistical literacy is that people who don’t have it seem to not know they don’t have it – not even when someone points out that statistical literacy is a thing that not all people have. To someone who is not statistically literate, statistical reasoning sounds like the ramblings of someone confused and illogical.
To be clear: I’m not knocking on anyone here. As I’ve previously admitted, I wasn’t statistically literate until fairly recently. I didn’t become statistically literate because I tried to. I mean, how could I? I didn’t even know it was a thing. It just happened by accident when I read lots of books on varied topics inside and outside statistics. Out of nowhere, I discovered I had this new lens through which I could look at the world and see it all differently.[1]
The whole thing reminds me of the idea Scott Alexander proposed about missing developmental milestones. This certainly seems like one of them: either someone taught you to think statistically, and it seems like second nature, or you never learned it, and then you don’t know what’s missing.
The problem is training
This leads into another important point: I’m certainly not claiming any one person is incapable of statistical literacy. I think it’s generally within reach of most people I meet. But, as formal operational thought is described in Scott Alexander’s article, statistical literacy
Our culture has yet to develop techniques for training large amounts of people in statistical literacy. Our elementary school teachers know how to train students in reading, writing, and basic logical reasoning. But I believe most of them are not statistically literate. This means
And if the teachers do not see these things, if the teachers are not statistically literate, how on Earth are they going to teach it to their students?
I suspect this will improve with time. Statistical reasoning wasn’t even invented 400 years ago. Unlike verbal language and art, it is not an innately human thing to do. Like logical reasoning, it will take time for it to spread, and it will do so at first slowly, then suddenly. I think it will, in the next few centuries, become as important a marker of civilisation as actual literacy and numeracy is today.
Statistical literacy is required for data-driven decisions
Once one starts looking for it, differences in statistical literacy pop up everywhere. Dan Luu writes that he is “looking at data” better than others (my emphasis):
I know the term for it: statistical literacy. Dan, you are practicing your statistical literacy.
When the data are difficult and uncooperative, statistical literacy is needed to look at it in a way that improves decisions – or at least does not make them worse. Dan Luu goes further and notes that most people who are not statistically literate don’t even bother collecting the data in the first place – they haven't yet established the supremacy of studying the outcome, but are instead using assumption as the motive of duty: When people attempt to data-drive despite lacking statistical literacy, they often end up flailing about and making things worse, eventually giving up on the idea and reverting to decisions based on logic and/or faith.
All of this happens in a world that is turning increasingly statistical. Many of our productivity-enhancing technologies these days incorporate statistical reasoning to make decisions when presented with wobbly information. Our obsession with determinism in software systems is, I think, a temporary fad, just as it was in science.
Improving statistical literacy
What originally prompted this article out of months of thinking from my end was Cedric Chin over at Commoncog publishing an article on Becoming Data Driven, From First Principles. That is an excellent article which just might help nudge a predisposed organisation into statistical literacy.
As I mentioned, I have also read a lot of books that nudged me in the right direction, but I’m not yet at a point where I can make a concrete recommendation. I hope to re-read and review some of them over the coming year, which would hopefully put me in a better spot to recommend.
I’m also hoping that I can carve out some time to try to measure people’s statistical literacy, which would help me pinpoint exactly what it is about, and thus allow for the construction of an effective curriculum.
More research is needed
All of these words are meaningless in the sense that they are just a wild man’s speculation. I have not gone through the trouble Lecky did when he chronicled the rise of rationalism.
On the flip side, the hypothesis fuzzily outlined in this article should be testable. If I’m correct about statistical literacy, it should be possible to design a questionnaire with psychometric reliability and validity, with diverse questions that all seem to measure a construct that sounds like statistical literacy.
I don’t know exactly what the items in the questionnaire would be. I have some ideas and I’ve run a few trial surveys (massive thanks to my incredibly helpful test subjects!), but not arrived at anything concrete yet. If someone would donate me large amounts of money I would love to actively research this subject. In the mean time, I can only think about it in my spare time and sometimes write about it online.
This lens is still something I’m polishing and discovering more ways in which it can be useful.
From Ecclesiastes 9:11. In the King James Bible, this is phrased as “I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.” I noticed that in the International Children’s Bible, it starts with “I also realized something else here on earth that is senseless: The fast runner does not always win the race. […]” Hah! Senseless! Statstical illiteracy!
I think it was Deming who, as a rule, gave a passing grade to everyone in his class, because as long as he was doing a passable job of teaching, any problems in learning is unlikely to rest with factors the student can control. He used tests not as a way to fail students, but as a way to calibrate how well he was teaching the material. He used students as measuring devices for his teaching skill, acknowledging that sometimes individual students don’t fairly represent his abilities but on aggregate, they will.