x

LESSWRONG

LW

David Baek

David Baek

Message

PhD student @ MIT EECS

35

2y

David Baek

PhD student @ MIT EECS

David Baek — LessWrong

Scaling Laws for Scalable Oversight

by Subhash Kantamneni, Josh Engels, David Baek, and Max Tegmark

Subhash, Josh, and David were co-first authors on this project. The full paper is here, and our code can be found here. TLDR We empirically study the success of weak-to-strong oversight as we scale the intelligence of the weak overseer model (Guard) and strong adversary model (Houdini) in four oversight...

Apr 30, 2025•38

Scoping LLMs

by erik, David Baek, emile delcourt, and 4gate

Emile Delcourt, David Baek, Adriano Hernandez, Erik Nordby with advising from Apart Lab Studio Introduction & Problem Statement Helpful, Harmless, and Honest (”HHH”, Askell 2021) is a framework for aligning large language models (LLMs) with human values and expectations. In this context, "helpful" means the model strives to assist users...

Apr 10, 2025•4