The reward engineering problem — LessWrong