Power Laws Are Not Enough
This is a linkpost for work done as part of MATS 9.0 under the mentorship of Richard Ngo. Loss scaling laws are among the most important empirical findings in deep learning. This post synthesises evidence that, though important in practice, loss-scaling per se is a straightforward consequence of very low-order...