The difference in compute between AlexNet and AlphaZero is because for AlexNet you are only counting the flops during training, while for AlphaZero you are counting both the training and the self-play data generation (which does 800 forwards per move * ~200 moves to generate each game).
If you were to compare supervised training numbers for both (e.g. training on human chess or Go games) then you'd get much closer.
The TOPS numbers from the wiki page seem wrong. TPUv1 had 92 TOPS (uint8); for TPUv3 the "90 TOPS" refers to a single chip, but I'm fairly sure that when the paper says "8 TPUv3s" they mean 8 cards, as that's how they are available on Google Cloud (1 card = 4 chips).
Only Anakin actually runs the environment on the TPU, and this only works for pretty simple environments (basically: can you implement it in JAX?) Sebulba runs environments on the host, which is what would have been done for this paper too (no idea if they used Sebulba or had a different setup).
This doesn't really matter though, because for these simulated environments it's fairly simple to fully utilize the TPUs by running more (remote) environments in parallel.