x

LESSWRONG

LW

Noah Siegel — LessWrong

Noah Siegel

Noah Siegel

Message

39

7y

Noah Siegel

39

7y

On scalable oversight with weak LLMs judging strong LLMs

by zac_kenton, Noah Siegel, janos, Jonah Brown-Cohen, Samuel Albanie, David Lindner, and Rohin Shah

Abstract Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a human judge; consultancy, where a single AI tries to convince a human judge that asks questions; and compare to a baseline of direct question-answering,...

Jul 8, 2024•49