Positional kernels of attention heads — LessWrong