LESSWRONG
LW

Alan Cooney
109000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
58Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we're studying them anyway
Ω
17d
Ω
3
77White Box Control at UK AISI - Update on Sandbagging Investigations
Ω
2mo
Ω
10