Reward learning summary — LessWrong