x
Reward Hacking Without Egregious Misalignment in an RL-Only Setting — LessWrong