x
LessWrong
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
LW
Login
[CS2881r][Week 8] When Agents Prefer Hacking To Failure: Evaluating Misalignment Under Pressure
by
Joseph Bejjani
,
Itamar Rocha Filho
,
Haichuan Wang
,
Zidi Xiong