x
Making Linear Probes Interpretable — LessWrong