Interestingly, after a certain layer, the first principle component becomes identical to the mean difference between harmful and harmless activations.

Do you think this can be interpreted as the model having its focus entirely on "refusing to answer" from layer 15 onwards? And if it can be interpreted as the model not evaluating other potential moves/choices coherently over these layers. The idea is that it could be evaluating other moves in a single layer (after layer 15) but not over several layers since the residual stream is not updated significantly.

Especially can we interpret that as the model not thinking coherently over several layers about other policies, it could choose (e.g., deceptive policies like defecting from the policy of "refusing to answer")? I wonder if we would observe something different if the model was trained to defect from this policy conditional on some hard-to-predict trigger (e.g. whether the model is in training or deployment).

Reply

Scaling of AI training runs will slow down after GPT-5

Maxime Riché17d10

Thank for the great comment!

Do we know if distributed training is expected to scale well to GPT-6 size models (100 trillions parameters) trained over like 20 data centers? How does the communication cost scale with the size of the model and the number of data centers? Linearly on both?

After reading for 3 min this:
Google Cloud demonstrates the world’s largest distributed training job for large language models across 50000+ TPU v5e chips (Google November 2023). It seems that scaling is working efficiently at least up to 50k GPUs (GPT-6 would be like 2.5M GPUs). There are also some surprising linear increases in start time with the number of GPUs, 13min for 32k GPUs. What is the SOTA?

Reply

Scaling of AI training runs will slow down after GPT-5

Maxime Riché18d40

The title is clearly an overstatement. It expresses more that I updated in that direction, than that I am confident in it.

Also, since learning from other comments that decentralized learning is likely solved, I am now even less confident in the claim, like only 15% chance that it will happen in the strong form stated in the post.

Maybe I should edit the post to make it even more clear that the claim is retracted.

Reply

The longest training run

Maxime Riché18d10

This is actually corrected on the Epoch website but not here (https://epochai.org/blog/the-longest-training-run)

Reply

The longest training run

Maxime Riché18d10

We could also combine this with the rate of growth of investments. In that case we would end up with a total rate of growth of effective compute equal to . This results in an optimal training run length of $L = 1 / (g_{H} + g_{I} + g_{S}) \approx 0.21$ years, ie $2.52$ months.

Why is g_I here 3.84, while above it is 1.03?

Reply

Dangers of Closed-Loop AI

Maxime Riché2mo30

Are memoryless LLMs with a limited context window, significantly open loop? (Can't use summarization between calls nor get access to previous prompts)

Reply

We need a Science of Evals

Maxime Riché4moΩ390

FYI, the "Evaluating Alignment Evaluations" project of the current AI Safety Camp is working on studying and characterizing alignment(propensity) evaluations. We hope to contribute to the science of evals, and we will contact you next month. (Somewhat deprecated project proposal)

Reply

An illustrative model of backfire risks from pausing AI research

Maxime Riché5mo10

Interesting! I will see if I can correct that easily.

Reply

AI Timelines

Maxime Riché6mo79

Thanks a lot for the summary at the start!

Reply

AI Alignment Breakthroughs this week (10/08/23)

Maxime Riché7mo30

I wonder if the result is dependent on the type of OOD.

If you are OOD by having less extractable information, then the results are intuitive.
If you are OOD by having extreme extractable information or misleading information, then the results are unexpected.

Oh, I just read their Appendix A: "Instances Where “Reversion to the OCS” Does Not Hold"
Outputting the average prediction is indeed not the only behavior OOD. It seems that there are different types of OOD regimes.

Reply