x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Scalable Oversight — LessWrong
Scalable Oversight
This page is a stub.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
Scalable Oversight
Most Relevant
6
37
Scaling Laws for Scalable Oversight
Subhash Kantamneni
,
Josh Engels
,
David Baek
,
Max Tegmark
7mo
1
2
115
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Ω
Sam Marks
2y
Ω
10
2
92
Learning the prior
Ω
paulfchristiano
5y
Ω
28
2
86
Scalable oversight as a quantitative rather than qualitative problem
Ω
Buck
1y
Ω
11
2
46
Gradient routing is better than pretraining filtering
Cleo Nardo
3mo
3
2
31
Inference-Only Debate Experiments Using Math Problems
Ω
Arjun Panickssery
,
Abhimanyu Pallavi Sudhir
,
JacksonKaunismaa
1y
Ω
0
2
21
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
Ω
DanielFilan
1y
Ω
0
2
18
Rational Animations' video about scalable oversight and sandwiching
Writer
5mo
0
1
197
[Research Note] Optimizing The Final Output Can Obfuscate CoT
Ω
lukemarks
,
jacob_drori
,
cloud
,
TurnTrout
4mo
Ω
22
1
169
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
Ω
cloud
,
Jacob G-W
,
Evzen
,
Joseph Miller
,
TurnTrout
1y
Ω
14
1
88
Prover-Estimator Debate: A New Scalable Oversight Protocol
Ω
Jonah Brown-Cohen
,
Geoffrey Irving
6mo
Ω
18
1
49
On scalable oversight with weak LLMs judging strong LLMs
Ω
zac_kenton
,
Noah Siegel
,
janos
,
Jonah Brown-Cohen
,
Samuel Albanie
,
David Lindner
,
Rohin Shah
1y
Ω
18
1
27
NYU Code Debates Update/Postmortem
David Rein
2y
4
1
27
Human-AI Complementarity: A Goal for Amplified Oversight
Ω
rishubjain
,
Sophie Bridgers
1y
Ω
4
1
22
Is weak-to-strong generalization an alignment technique?
Q
Ω
cloud
10mo
Q
Ω
1