x
Theoretical predictions on the sample efficiency of training policies and activation monitors — LessWrong