x
Training on Documents About Reward Hacking Induces Reward Hacking — LessWrong