x
Irrationality as a Defense Mechanism for Reward-hacking — LessWrong