x
Positional kernels of attention heads — LessWrong