Interpreting Gradient Routing’s Scalable Oversight Experiment
%TLDR. We discuss the setting that Gradient Routing (GR) paper uses to model Scalable Oversight (SO). The first part suggests an improved naive baseline using early stopping which performs on-par with GR. In the second part, we compare GR’s setting to SO and Weak-to-Strong generalization (W2SG), discuss how it might...
Apr 512