How well can the GPT architecture solve the parity task?

Suppose I give it pairs of strings and ask it to output 1 if the the number of 1s in the string is even and zero if it's odd.

e. g.

0 -> 0

1 -> 1

11 -> 0

101 -> 0

1101-> 1

10101001 -> 0

111000101110 -> 1

How well does it do on this task? What if we finetune it on sample data?

It does not, sad to say. I tried space-separating each digit for the BPE issue, and its general completion is to just copy the previous line. The log probs of the possible completions are generally 50:50 for 0/1, showing it's not tapping into any parity counting.