Self-prediction acts as an emergent regularizer
by Cameron Berg, Kvee, Mike Vaiana, Diogo de Lucena, florin_pop, and Trent Hodgeson
TL;DR: In our recent work with Professor Michael Graziano (arXiv, thread), we show that adding an auxiliary self-modeling objective to supervised learning tasks yields simpler, more regularized, and more parameter-efficient models. Across three classification tasks and two modalities, self-modeling consistently reduced complexity (lower RLCT, narrower weight distribution). This restructuring effect...
Oct 23, 202492