Why does generalization work? — LessWrong