LQPR: An Algorithm for Reinforcement Learning with Provable Safety Guarantees — LessWrong