x
How to train your transformer — LessWrong