Implementing a Transformer from scratch in PyTorch - a write-up on my experience — LessWrong