EIS IX: Interpretability and Adversaries — LessWrong