A while back I read How to Measure Anything and found it fascinating. In my day job, I spend quite a bit of time trying to make sense of the world by looking at dashboards of requests, latencies, error rates, etc. (software systems).

After finishing the book and taking copious notes, I understood that it gave me a prepackaged process that I could apply as-is, but I found it very difficult to adapt to everyday situations. I don't think I picked up a good intuition about stats, in other words.

I'm looking to change that. Specifically, I want to learn to apply stats in these two situations:

  • measuring things. Mostly software systems, but open to little experiments. Dan Luu used to measure a lot of fun things.
  • understanding how others measure things. I'd like to be able to judge if claims made in a paper about covid spread or social media addiction are backed up by the math/data in the paper.

The challenge I'm facing is that I know a bunch of techniques, but not how they relate to each other and the problems they're meant to solve. To illustrate what I mean: I know how to get percentiles and calculate means, but until today morning I didn't know why averaging percentiles is usually a bad idea. I'm missing the map.

I've seen these books recommended as a good way to start:

  • Statistics, 4th Edition 4th Edition, by Freedman, Pisani, and Purves
  • Probability Theory: The Logic of Science, by Jaynes
  • An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements, by Taylor
  • Think Stats, by Downey

But I also wanted to ask someone familiar with the field:

  • Is it best to start with an introductory textbook and branch out from there?
  • Are there specific subfields / topics I should be focusing on (or avoiding)?
  • Is what I'm looking to learn labeled in some way? For example, I can't tell if this is data analytics or data science or X.

New to LessWrong?

New Answer
New Comment

2 Answers sorted by

I'm assuming you are interested in learning about something by measuring one or more of its attributes, and then using statistics to extract information from the measurements, i.e., you are interested in a hands-on application, then books I found useful include:

Statistics for experimenters by Box, Hunter and Hunter

Design and Analysis of experiments by Montgomery.

Thanks! This is really helpful--I think this is exactly what I'm trying to do.

Are these texts part of a specific academic track/degree or field of study? It sounds like something someone in engineering would spend a semester on. But also like something someone could spend a career on studying.

Being able to accurately assess a paper's claims is, unfortunately, a very high bar. A large proportion of scientists fall short of it. see: [https://statmodeling.stat.columbia.edu/2022/03/05/statistics-is-hard-etc-again/

Most people with a strong intuition for statistics have taken courses in probability. It is foundational material for the discipline.

If you haven't taken a probability course, and if you're serious about wanting to learn stats well, I would strongly recommend to start there. I think Harvard's intro probability course is good and has free materials: https://projects.iq.harvard.edu/stat110/youtube 

I've taught out of Freedman, but not the other texts. It's well written, but it is targeted at a math-phobic audience. A fine choice if you do not wish to embark on the long path

Thanks! I'll look this over.

Out of curiosity,

Most people with a strong intuition for statistics have taken courses in probability. It is foundational material for the discipline.

Do some people learn statistics without learning probability? Or, what's different for someone who learns only stats and not probability?

(I'm trying to grasp what shape/boundaries are at play between these two bodies of knowledge)

3mikes7mo
Statistics is trying to "invert" what probability does.  Probability starts with a model, and then describes what will happen given the model's assumptions. Statistics goes the opposite direction: it is about using data to put limits on the set of reasonable/plausible models.  The logic is something like: "if the model had property X, then probability theory says I should have seen Y. But, NOT Y.  Therefore, NOT X." It's invoking probability to get the job done. Applying statistical techniques without understanding the probability models involved is like having a toolbox, without understanding why any of the tools work. It all goes fine until the tools fail (which happens often, and often silently) and then you're hosed.  You may fail to notice the problems entirely, or may have to outsource judgments to others with more experience.
1matto7mo
Thanks, this is incredibly useful. I think I understand enough to put together a curriculum to delve into this topic. Starting with the harvard course you recommended.
1 comment, sorted by Click to highlight new comments since: Today at 3:02 PM

Pattern-match the real problems or their parts to the problems in the textbook. That will help you figure out what to do.