There are 5 strong contender for best Bayesian Statistics book

TLDR

  • Statistical Rethinking, henceforth SR[1]
    • Up to speed fast, no integrals, very intuitive approach.
  • Doing Bayesian Data Analysis, henceforth The Dog Book[2]
    • This is the easiest book. If your goal is only to create simple models and you aren't interested in understanding the details, then this is the book for you.
  • A Student’s Guide to Bayesian Statistics, henceforth Student's Guide[3]
    • This book has the opposite focus of the Dog book. Here the author slowly goes through the philosophy of Bayes with an intuitive mathematical approach.
  • Regression and Other Stories, henceforth ROS[4]
    • Good you want a slower and thorough approach where you also learn the Frequentest perspective.
  • Bayesian Data Analysis, henceforth BDA[5]
    • The most advanced text, very math heavy, best as a second book after reading one or two of the others, unless you are already a statistician.

Irregardless of which one you pick, watching the YouTube lectures for either SR or Student's Guide is very helpful

Short review of each Book

The Dog Book: Is the easiest book is thus if you do not strive for understanding but simply want to quickly get to a skill level where you need to develop a not to fancy model then this is a great book, It is also a good reference book as each chapter is based on a specific link and function and regression variable type, thus, if you want to do an Bayesian ANOVA you simply look for chapters named something like "categorical predictor with metric outcome".

Student's Guide: This book has the opposite focus as the Dog book. Here the author slowly goes trough the philosophy of bayes with a mathematical intuitive approach. It looks like a very good reference book. A Bayesian professor has recommended it as one of the best introductions to STAN. Chapter 8 is also very good. It starts with a graph of the relationship between all likelihoods and then goes trough EACH and EVERY ONE with an intuitive example and some nice plots. I would recommend everyone to read the chapter and/or to use the chapter as a reference whenever you have a few 'candidate' likelihoods in your head. Reading Chapter 12 in SR will subsequently teach you to create mixtures of these likelihoods if you need further hacks such as zero inflation. Remember the YouTube series explain the math very well. So use them as a supplement!

ROS: This book is a 'normal' statistics book written by Bayesians, thus it teaches both philosophies and have very great intuitive mathematical examples. It is a trophy of very intuitive considerations about model building, such as:  is statistically significantly different from 0, but hardly even 1  away from from  because variances are additive, so the uncertainty of the 15 difference is . The same is true for interaction terms as they have two sources of error, thus we should a priori expect those to have wider posteriors! The slow approach makes it immensely readable for people like me 'who already know this', as half of the book is basically 'there be dragons' explanations of everything that can go wrong when you are doing a regression, and them doing the analysis twice to show the difference between the different philosophies.

SR: This is another great book, and it uses a level of math that is easier than ROS and Student's Guide but more rigorous than The Dog Book. This means that you get up to speed much faster, and it has you building quite advanced models by skipping the math and by heavily developing your intuition. Until Chapter 11 it's very great, but after than it starts introducing advanced concepts, and the material in Chapter 14-16 is not covered in the 4 books above. So it can also be bought as an "advanced supplement" to any of the 4 books above. Also the lectures are phenomenal and track well with the chapters, so I advise watching them prior to reading a chapter to get a big picture overview before going deeper.

BDA: This book used to be the bible. It's very mathy compared to the 4 other and seems to be written by field experts. Part 1 and 2 seem 'coherent' and are actually quite good for understanding the math that the other books use but don't explain well.. Part 3 and 4 are mathy versions of what are superficially covered in the above 4 books and most of Part 5 is simply state of the art Bayesian modelling expressed using only math. I think it might be slightly more intuitive than reading the source papers - but only slightly.

Recommendations / Extra considerations

Causal Inference: The books SR and ROS put extra emphases on causality, thus if you have observational data, where you want to predict the outcome of an intervention these books are 'extra' good. SR emphasizes Judea Pearl's graph based approach which is superior when doing fancy models, given they are both introductory I think that ROS actually teaches you to guard against more causal errors, so it's hard to declare a winner.

Math:

Dog Book << SR < ROS < Studet's Guide << BDA

  • I have taken less than 15 ECTS of math (studied biology or sociology), then pick The Dog Book or SR.
  • I have taken some math (Engineering or Econometrics): Student's Guide or ROS
  • I have an undergrad degree in math: BDA

I want to make cool models before page 200:

  • SR or Dog Book

I am a patient learner:

  • ROS or Student's Guide

I want a good reference book, all are decent, SR is worst because it has 'playful' chapter titles such as "Ulysses’ Compass" and "The Golem of Prague" which actually serves as helpful memetic when you are reading, but becomes a pain a year later when you need to look up things.

My Experience with the books

I have not read all the books from cover to cover, here is my experience with each one:

  • Doing Bayesian Data Analysis (The Dog Book)
    • Read cover to cover
  • Statistical Rethinking (SR)
    • Read Chapter 12 and half of 14
      • Mixture Likelihoods and Covariance Models.
    • Watched all 20 lectures
    • Solved assignments via study groups
  • Regression and Other Stories (ROS)
    • Read Chapter 1-9
    • Read a lot of Andrew Gelmans Blog
  • Bayesian Data Analysis (BDA)
    • Read Chapter 1-13
  • A Student’s Guide to Bayesian Statistics (Student's Guide)
    • Skimmed the earlier chapters.
    • Read Chapter 8
    • Watched about 10-20 hours of his YouTube lectures

  1. Richard McElreath "Statistical Rethinking" ↩︎

  2. John Kruschke "Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan ↩︎

  3. Ben Lambert "A Student’s Guide to Bayesian Statistics" ↩︎

  4. Gelman, Hill and Vehtari, “Regression and Other Stories” ↩︎

  5. Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. "Bayesian Data Analysis" ↩︎

62

6 comments, sorted by Click to highlight new comments since: Today at 6:34 PM
New Comment

This would be useful as a comment on https://www.lesswrong.com/posts/xg3hXCYQPJkwHyik2/the-best-textbooks-on-every-subject, not sure of best way to update existing lists though.

I will write a post shilling for myself, thanks. I was waiting for the post to be 'liked', if it got -10 karma then there would be no use in shilling for it :)

I almost didn't open it because it looked like you were asking a question, not giving an answer, and there were 0 (now 2) comments. Title change?

Good point!

original: Applied Bayesian Statistics - Which book to read?

  1. Applied Bayesian Statistics - Which book should you read?
  2. Literature Review of 5 Applied Bayesian Statistics Books.
  3. Book Review of 5 Applied Bayesian Statistics Books.

I picked 3, if other people have strong feeling feel free to suggest other titles

I'm reading BDA3 right now, and I'm on chapter 6. You described it well. It takes a lot of thinking to get through, but is very comprehensive. I like how it's explicitly not just a theory textbook. They demonstrate each major point by describing a real-world problem (measuring cancer rates across populations, comparing test-prep effectiveness), and attacking it with multiple models (usually frequentist to show limitations and then their Bayesian model more thoroughly. It has a focus on learning the tools well enough to apply them to real-world problems.

I plan to start skimming soon. It seems the first two sections are pedagogical, and the remainder covers techniques which I would like to know about but don't need in detail.

Edit: One example I really enjoyed, and which felt very relevant to today, was on estimating lung-cancer hotspots in America. It broke the country down by county, and first displayed a map of the USA with counties in the top 10% of lung-cancer rates. Much of the highlighted region was in the rural southwest and Rocky mountain region. It asked, what do you think makes these regions have such high rates? It then showed another map, this one of counties in the bottom 10% of lung-cancer rates, and the map focused on the same regions!

Turns out, this was mostly the result of these regions containing many low-population counties, which meant rare-event sampling could skew high very easily, just by chance. If the base rate is 5 per 10,000, and you have 2 cases in a county with 1,000 people, you look like a superfund site. But sample the next year and you might find 0 cases: a county full of young health-freaks.

If you model lung-cancer rates as a hierarchical model with a distribution for county cancer-rates, and each county as being sampled from this, and then sampling cancer events from it's specific rate, then you can get a Bayes-adjusted incidence rate for each county which will regress small counties to the mean.

This made me read Covid charts which showed hot-spot counties much differently. I noticed that the counties they list are frequently small. Right now, all the counties on the NYTimes list, for example have less than 20,000 people in them, which is, I believe, in the bottom 25% of counties by size roughly.

I loved that example as well, I have heard it elsewhere described as "The law of small numbers", where small subsets have higher variance and therefore more frequent extreme outcomes. I think it's particularly good as the most important part of the Bayesian paragdime is the focus on uncertainty.

The appendix on HMC is also a very good supplement to gain a deeper understanding of the algorithm after having read the description in another book first.