x
The Perils of Optimizing Learned Reward Functions — LessWrong