x
Natural Emergent Misalignment from Reward Hacking — LessWrong