Intuitive examples of reward function learning? — LessWrong