x
Why Not Just Train For Interpretability? — LessWrong