Counterfactual Oracles = online supervised learning with random selection of training episodes — LessWrong