x
Detecting collusion through multi-agent interpretability — LessWrong