How does a toy 2 digit subtraction transformer predict the difference? — LessWrong