GPT-2's positional embedding matrix is a helix — LessWrong