x
Reinforcement Learning by AI Punishment — LessWrong