Some work on connecting UDT and Reinforcement Learning — LessWrong