Understanding Emergence in Large Language Models — LessWrong