x
Reporting Tasks as Reward-Hackable: Better Than Inoculation Prompting? — LessWrong