How well can the GPT architecture solve the parity task? — LessWrong