x

LESSWRONG

LW

BarAltiva — LessWrong

BarAltiva

BarAltiva

Message

32

1

6y

BarAltiva

32

6y

[Paper] When can we trust untrusted monitoring? A safety case sketch across collusion strategies

by Nelson Gardner-Challis, Morgan S, J Bostock, BarAltiva, joanv, and Charlie Griffin

This research was completed for LASR Labs 2025 by Nelson Gardner-Challis, Jonathan Bostock, Georgiy Kozhevnikov and Morgan Sinclaire. The team was supervised by Joan Velja and Charlie Griffin (University of Oxford, UK AI Security Institute). The full paper can be found here. Tl;dr We did a deep dive into untrusted...