x
Machine Unlearning Evaluations as Interpretability Benchmarks — LessWrong