Reinforcement Learner Wireheading — LessWrong