Isn't GPT3 already almost at the theoretical limit of the scaling law from the paper? This is what is argued by nostalgebraist in his blog and colab notebook. You also get this result if you just compare the 3.14E23 FLOP (i.e. 3.6k PFLOPS-days) cost of training GPT3 from the lambdalabs estimate to the ~10k PFLOPS-days limit from the paper.
(Of course, this doesn't imply that the post is wrong. I'm sure it's possible to train a radically larger GPT right now. It's just that the relevant bound is the availability of data, not of compu... (read more)