x

LESSWRONG

LW

Egor Zverev — LessWrong

Egor Zverev

Egor Zverev

Message

65

1

3y

Egor Zverev

65

3y

A comparison of causal scrubbing, causal abstractions, and related methods

by Erik Jenner, Adrià Garriga-alonso, and Egor Zverev

Summary: We explain the similarities and differences between three recent approaches to testing interpretability hypotheses: causal scrubbing, Geiger et al.'s causal abstraction-based method, and locally consistent abstractions. In particular, we show that all of these methods accept some hypotheses rejected by some of the others. Acknowledgements: Thanks to Dylan Xu...

Jun 8, 2023•73