This is a linkpost for https://christophm.github.io/interpretable-ml-book/
Nice! This is very readable - a lot like a collection of blog posts rather than a dense book. Hopefully if you want more density you can check the references, though I haven't gotten into them.
Skimming it sparked some thoughts about translating some of the described interpretability methods from "shallow" AI where the features aren't learned much to "deep" AI where some complicated features get learned, just by swapping "input features" for "features in a latent space."
This is a book by Christoph Molnar about interpretability. It covers a lot of traditional, non AI safety interpretability. Some recommendations for chapters to read are in an excerpt from our summary of it in the interpretability starter resources:
Why share it here? It is a continuously-updated, high quality book on interpretability as seen from outside AI safety which seems very relevant to understanding what the field in general looks like. It has 78 authors and seems very canon. The first edition is from Apr 11, 2019 while the second edition (latest version) is from Mar 04, 2022. It was last updated on the Oct 13, 2022.
From the introduction of the book: