A comparison of causal scrubbing, causal abstractions, and related methods
by Erik Jenner, Adrià Garriga-alonso, and Egor Zverev
Summary: We explain the similarities and differences between three recent approaches to testing interpretability hypotheses: causal scrubbing, Geiger et al.'s causal abstraction-based method, and locally consistent abstractions. In particular, we show that all of these methods accept some hypotheses rejected by some of the others. Acknowledgements: Thanks to Dylan Xu...
Jun 8, 202373