x
Explaining undesirable model behavior: (How) can influence functions help? — LessWrong