Access to agent CoT makes monitors vulnerable to persuasion
This research was completed for London AI Safety Research (LASR) Labs 2025 by Jennifer Za, Julija Bainiaskina, Nikita Ostrovsky and Tanush Chopra. The team was supervised by Victoria Krakovna (Google DeepMind). Find out more about the programme and express interest in upcoming iterations here. Introduction Many proposals for controlling misaligned...