[Paper] When can we trust untrusted monitoring? A safety case sketch across collusion strategies
This research was completed for LASR Labs 2025 by Nelson Gardner-Challis, Jonathan Bostock, Georgiy Kozhevnikov and Morgan Sinclaire. The team was supervised by Joan Velja and Charlie Griffin (University of Oxford, UK AI Security Institute). The full paper can be found here. Tl;dr We did a deep dive into untrusted...
Mar 1045