Estimating efficiency improvements in LLM pre-training
TL/DR: GPT-3 was trained on 10,000 NVIDIA V100’s for 14.8 days. Based on my - very uncertain - estimates, OpenAI should now be able train a comparably capable model on an equivalent cluster in under an hour. This implies a drop in hardware requirements of more than 400x. Factoring out...
Jan 19, 202443