x
Reward Mismatches in RL Cause Emergent Misalignment — LessWrong