x
Reward/value learning for reinforcement learning — LessWrong