Training a Reward Hacker Despite Perfect Labels — LessWrong