x
Classic Alignment-Faking Evaluations Measure Jailbreak Detection, Not Scheming [in some frontier models] — LessWrong