Ambiguous out-of-distribution generalization on an algorithmic task
by Wilson Wu and Louis Jaburi
Introduction It's now well known that simple neural network models often "grok" algorithmic tasks. That is, when trained for many epochs on a subset of the full input space, the model quickly attains perfect train accuracy and then, much later, near-perfect test accuracy. In the former phase, the model memorizes...
Feb 13, 202584