Reward function learning: the learning process — LessWrong