On Developing a Mathematical Theory of Interpretability — LessWrong