This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Euan Ong
https://ong.ac
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
46
Building and evaluating alignment auditing agents
Ω
2mo
Ω
1
141
Auditing language models for hidden objectives
Ω
6mo
Ω
15
58
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Ω
2y
Ω
9
Comments