x
Louis Jaburi: Compact Proofs of Model Performance via Mechanistic Interpretability — LessWrong