x
Activation space interpretability may be doomed — LessWrong