Interpretability Tools Are an Attack Channel — LessWrong