Benign model-free RL — LessWrong