x
Pando: A Controlled Benchmark for Interpretability Methods — LessWrong