This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Scalable Oversight
•
Applied to
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
by
Raemon
1mo
ago
•
Created by
Raemon
at
1mo