x
Inducing Unprompted Misalignment in LLMs — LessWrong