x
A new approach to interpretability: round-trip neural network compilation-decompilation — LessWrong