Scaling Laws for Scalable Oversight
Subhash, Josh, and David were co-first authors on this project. The full paper is here, and our code can be found here. TLDR We empirically study the success of weak-to-strong oversight as we scale the intelligence of the weak overseer model (Guard) and strong adversary model (Houdini) in four oversight...
Apr 30, 202538

