x
Alignment Faking in Large Language Models — LessWrong