Transformer inductive biases & RASP — LessWrong