This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Subscribe
Discussion
(0)
Scalable Oversight
Subscribe
Discussion
(0)
This page is a stub.
Posts tagged
Scalable Oversight
Most Relevant
2
107
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Ω
Sam Marks
9mo
Ω
10
2
85
Scalable oversight as a quantitative rather than qualitative problem
Ω
Buck
7mo
Ω
11
2
31
Inference-Only Debate Experiments Using Math Problems
Ω
Arjun Panickssery
,
Abhimanyu Pallavi Sudhir
,
JacksonKaunismaa
6mo
Ω
0
2
21
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
Ω
DanielFilan
5mo
Ω
0
1
157
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
Ω
cloud
,
Jacob G-W
,
Evzen
,
Joseph Miller
,
TurnTrout
2mo
Ω
12
1
49
On scalable oversight with weak LLMs judging strong LLMs
Ω
zac_kenton
,
Noah Siegel
,
janos
,
Jonah Brown-Cohen
,
Samuel Albanie
,
David Lindner
,
Rohin Shah
7mo
Ω
18
1
27
Human-AI Complementarity: A Goal for Amplified Oversight
Ω
rishubjain
,
Sophie Bridgers
1mo
Ω
3
1
27
NYU Code Debates Update/Postmortem
David Rein
8mo
4
1
5
Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir
4mo
1
1
1
Automated monitoring systems
hiki_t
2mo
0