Reinforcement Learning by AI Punishment — LessWrong