Reinforcement learning and linguistic convention - History — LessWrong