Simply reverse engineering gpt2-small (Layer 0, Part 1: Attention) — LessWrong