Thought Crime: Backdoors & Emergent Misalignment in Reasoning Models — LessWrong