Brief Notes on Transformers — LessWrong