Reinforcement learning - History — LessWrong