Ethan Caballero

See section 5.3 "Reinforcement Learning" of https://arxiv.org/abs/2210.14891 for more RL scaling laws with number of model parameters on the x-axis (and also RL scaling laws with the amount of compute used for training on the x-axis and RL scaling laws with training dataset size on the x-axis).

re: youtube estimates

You'll probably find some of this twitter discussion useful:

https://twitter.com/HenriLemoine13/status/1572846452895875073

OP will find this paper useful:

https://arxiv.org/abs/2210.14891

I give a crisp definition from 6:27 to 7:50 of this video:

> Re: "Extrapolating GPT-N performance" and "Revisiting ‘Is AI Progress Impossible To Predict?’" sections of google doc

Read Section 5.6 titled "The Limit of the Predictability of Scaling Behavior" of "Broken Neural Scaling Laws" paper:

https://arxiv.org/abs/2210.14891v4

One other goal / theme of mechanistic interpretability research imo:

twitter.com/norabelrose/status/1588571609128108033

When f (in equation 1 of the paper ( https://arxiv.org/abs/2210.14891 ) not the video) of next break is sufficiently large, it gives you predictive ability to determine when that next break will occur; also, the number of seeds needed to get such predictive ability is **very** large. When f of next break is sufficiently small (& nonnegative), it does not give you predictive ability to determine when that next break will occur.

Play around with in this code to see what I mean:

https://github.com/ethancaballero/broken_neural_scaling_laws/blob/main/make_figure_1__decomposition_of_bnsl_into_power_law_segments.py#L25-L29

Sections 3.1 and 6.6 titled "Ossification" of "Scaling Laws for Transfer" paper (https://arxiv.org/abs/2102.01293) show that current training of current DNNs exhibits high path dependence.

arxiv.org/abs/2210.14891