Reward/value learning for reinforcement learning — LessWrong