x
Deception Channeling: Training Models to Always Verbalize Alignment Faking — LessWrong