Wiki Contributions

Comments

See section 5.3 "Reinforcement Learning" of https://arxiv.org/abs/2210.14891 for more RL scaling laws with number of model parameters on the x-axis (and also RL scaling laws with the amount of compute used for training on the x-axis and RL scaling laws with training dataset size on the x-axis).
 

re: youtube estimates

You'll probably find some of this twitter discussion useful:
https://twitter.com/HenriLemoine13/status/1572846452895875073

I give a crisp definition from 6:27 to 7:50 of this video: 

> Re: "Extrapolating GPT-N performance" and "Revisiting ‘Is AI Progress Impossible To Predict?’" sections of google doc


Read Section 5.6 titled "The Limit of the Predictability of Scaling Behavior" of "Broken Neural Scaling Laws" paper: 
https://arxiv.org/abs/2210.14891v4

When f (in equation 1 of the paper ( https://arxiv.org/abs/2210.14891 ) not the video) of next break is sufficiently large, it gives you predictive ability to determine when that next break will occur; also, the number of seeds needed to get such predictive ability is very large. When f of next break is sufficiently small (& nonnegative), it does not give you predictive ability to determine when that next break will occur.

Play around with  in this code to see what I mean: 
https://github.com/ethancaballero/broken_neural_scaling_laws/blob/main/make_figure_1__decomposition_of_bnsl_into_power_law_segments.py#L25-L29

Sections 3.1 and 6.6 titled "Ossification" of "Scaling Laws for Transfer" paper (https://arxiv.org/abs/2102.01293) show that current training of current DNNs exhibits high path dependence.

Load More