x
Extracting Performant Algorithms Using Mechanistic Interpretability — LessWrong