Jack Stone
22
2
Jack Stone has not written any posts yet.

Jack Stone has not written any posts yet.

I want to briefly preface this post by saying that this is my first on LW and I have no professional background in machine learning—I monitor the AI page as a sort of early-warning radar for AGI—so bear that in mind and feel free to correct any errors here.
With that out of the way, I did a back-of-the-envelope calculation for the amount of compute used to train the model, based on the numbers given in the NVIDIA post:
280 DGX A100 servers x 8 GPUs/server x 126 teraFLOP/GPU/second x 60.1 seconds/batch = 1.70 x 10^7 teraFLOP/batch = 17,000 petaFLOP/batch
271 billion tokens / 2048 tokens/sequence / 1920 sequences/batch = 68,919 batches
17,000 petaFLOP/batch x 68,919... (read more)
Thanks—this is very interesting