LESSWRONG
LW

1059
[CS2881r][Week 8] When Agents Prefer Hacking To Failure: Evaluating Misalignment Under Pressure
by Joseph Bejjani, Itamar Rocha Filho, Haichuan Wang, Zidi Xiong