x

LESSWRONG

LW

John Schulman — LessWrong

John Schulman

John Schulman

Message

484

Ω

186

1

32

6y

John Schulman

484

Ω

186

6y

Scaling Laws for Reward Model Overoptimization

by leogao, John Schulman, and Jacob_Hilton

TL;DR: Reward model (RM) overoptimization in a synthetic-reward setting can be modelled surprisingly well by simple functional forms. The coefficients also scale smoothly with scale. We draw some initial correspondences between the terms of the functional forms and the Goodhart Taxonomy. We suspect there may be deeper theoretical reasons behind...

Oct 20, 2022•103

Frequent arguments about alignment

Here, I’ll review some arguments that frequently come up in discussions about alignment research, involving one person skeptical of the endeavor (called Skeptic) and one person advocating to do more of it (called Advocate). I mostly endorse the views of the Advocate, but the Skeptic isn't a strawman and makes...

Jun 23, 2021•104