Notes on the Mathematics of LLM Architectures — LessWrong