Utility versus Reward function: partial equivalence — LessWrong