Thank for the great comment!

Do we know if distributed training is expected to scale well to GPT-6 size models (100 trillions parameters) trained over like 20 data centers? How does the communication cost scale with the size of the model and the number of data centers? Linearly on both?

After reading for 3 min this:
Google Cloud demonstrates the world’s largest distributed training job for large language models across 50000+ TPU v5e chips (Google November 2023). It seems that scaling is working efficiently at least up to 50k GPUs (GPT-6 would be like 2.5M GPUs). There are also some surprising linear increases in start time with the number of GPUs, 13min for 32k GPUs. What is the SOTA?

Reply

Scaling of AI training runs will slow down after GPT-5

Maxime Riché1d40

The title is clearly an overstatement. It expresses more that I updated in that direction, than that I am confident in it.

Also, since learning from other comments that decentralized learning is likely solved, I am now even less confident in the claim, like only 15% chance that it will happen in the strong form stated in the post.

Maybe I should edit the post to make it even more clear that the claim is retracted.

Reply

The longest training run

Maxime Riché2d10

This is actually corrected on the Epoch website but not here (https://epochai.org/blog/the-longest-training-run)

Reply

The longest training run

Maxime Riché2d10

We could also combine this with the rate of growth of investments. In that case we would end up with a total rate of growth of effective compute equal to . This results in an optimal training run length of $L = 1 / (g_{H} + g_{I} + g_{S}) \approx 0.21$ years, ie $2.52$ months.

Why is g_I here 3.84, while above it is 1.03?

Reply

Dangers of Closed-Loop AI

Maxime Riché1mo30

Are memoryless LLMs with a limited context window, significantly open loop? (Can't use summarization between calls nor get access to previous prompts)

Reply

We need a Science of Evals

Maxime Riché3moΩ370

FYI, the "Evaluating Alignment Evaluations" project of the current AI Safety Camp is working on studying and characterizing alignment(propensity) evaluations. We hope to contribute to the science of evals, and we will contact you next month. (Somewhat deprecated project proposal)

Reply

An illustrative model of backfire risks from pausing AI research

Maxime Riché5mo10

Interesting! I will see if I can correct that easily.

Reply

AI Timelines

Maxime Riché6mo79

Thanks a lot for the summary at the start!

Reply

AI Alignment Breakthroughs this week (10/08/23)

Maxime Riché7mo30

I wonder if the result is dependent on the type of OOD.

If you are OOD by having less extractable information, then the results are intuitive.
If you are OOD by having extreme extractable information or misleading information, then the results are unexpected.

Oh, I just read their Appendix A: "Instances Where “Reversion to the OCS” Does Not Hold"
Outputting the average prediction is indeed not the only behavior OOD. It seems that there are different types of OOD regimes.

Reply

Expectations for Gemini: hopefully not a big deal

Maxime Riché7mo10

This comes from OpenAI saying they didn't expect ChatGPT to be a big commercial success. It was not a top-priority project.

Reply