Reward IS the Optimization Target — LessWrong