florin_pop

Message

167

florin_pop

Self-prediction acts as an emergent regularizer

TL;DR: In our recent work with Professor Michael Graziano (arXiv, thread), we show that adding an auxiliary self-modeling objective to supervised learning tasks yields simpler, more regularized, and more parameter-efficient models. Across three classification tasks and two modalities, self-modeling consistently reduced complexity (lower RLCT, narrower weight distribution). This restructuring effect...

Oct 23, 202492

Key takeaways from our EA and alignment research surveys

Many thanks to Spencer Greenberg, Lucius Caviola, Josh Lewis, John Bargh, Ben Pace, Diogo de Lucena, and Philip Gubbins for their valuable ideas and feedback at each stage of this project—as well as the ~375 EAs + alignment researchers who provided the data that made this project possible. Background Last...

May 3, 2024114

TL;DR:

In our recent work with Professor Michael Graziano (arXiv, thread), we show that adding an auxiliary self-modeling objective to supervised learning tasks yields simpler, more regularized, and more parameter-efficient models.

Across three classification tasks and two modalities, self-modeling consistently reduced complexity (lower RLCT, narrower weight distribution). This restructuring effect may help explain the putative benefits of self-models in both ML and biological systems.

Agents who self-model may be reparameterized to better predict themselves, predict others, and be predicted by others. Accordingly, we believe that further exploring the potential effects of self-modeling on cooperation emerges as a promising neglected approach to alignment. This approach may also exhibit a 'negative alignment tax' to the degree that it may... (read 1120 more words →)

Background

Last month, AE Studio launched two surveys: one for alignment researchers, and another for the broader EA community.

We got some surprisingly interesting results, and we're excited to share them here.

We set out to better explore and compare various population-level dynamics within and across both groups. We examined everything from demographics and personality traits to community views on specific EA/alignment-related topics. We took on this project because it seemed... (read 6257 more words →)

LESSWRONG
LW

LESSWRONG
LW

florin_pop

florin_pop

florin_pop

Self-prediction acts as an emergent regularizer

Key takeaways from our EA and alignment research surveys

florin_pop

florin_pop

florin_pop

Self-prediction acts as an emergent regularizer

Key takeaways from our EA and alignment research surveys

TL;DR:

Background