Week One of Studying Transformers Architecture — LessWrong