What makes math problems hard for reinforcement learning: a case study — LessWrong