Reverse-engineering using interpretability — LessWrong