x
How many attention heads do you need to do XOR? — LessWrong