x

LESSWRONG

LW

myyycroft — LessWrong

myyycroft

myyycroft

Message

12

1

2

4y

myyycroft

12

4y

Interpreting Gradient Routing’s Scalable Oversight Experiment

by makataomu and myyycroft

%TLDR. We discuss the setting that Gradient Routing (GR) paper uses to model Scalable Oversight (SO). The first part suggests an improved naive baseline using early stopping which performs on-par with GR. In the second part, we compare GR’s setting to SO and Weak-to-Strong generalization (W2SG), discuss how it might...

Critique of machine unlearning

Wikipedia (as of January 2026): > Machine unlearning is a branch of machine learning focused on removing specific undesired element, such as private data, wrong or manipulated training data, outdated information, copyrighted material, harmful content, dangerous abilities, or misinformation, without needing to rebuild models from the ground up. A blog...