x
Reward function learning: the learning process — LessWrong