When is reward ever the optimization target? — LessWrong