David Baek

Message

David Baek has not written any posts yet.

Scaling Laws for Scalable Oversight

Subhash, Josh, and David were co-first authors on this project. The full paper is here, and our code can be found here. TLDR We empirically study the success of weak-to-strong oversight as we scale the intelligence of the weak overseer model (Guard) and strong adversary model (Houdini) in four oversight...

Apr 30, 202538

Scoping LLMs

Emile Delcourt, David Baek, Adriano Hernandez, Erik Nordby with advising from Apart Lab Studio Introduction & Problem Statement Helpful, Harmless, and Honest (”HHH”, Askell 2021) is a framework for aligning large language models (LLMs) with human values and expectations. In this context, "helpful" means the model strives to assist users...

Apr 10, 20254

David Baek hasn't written anything yet.

LESSWRONG
LW

LESSWRONG
LW

David Baek

David Baek

David Baek

David Baek

Scaling Laws for Scalable Oversight

Scoping LLMs

Scaling Laws for Scalable Oversight

Scoping LLMs