Tuna

Message

Access to agent CoT makes monitors vulnerable to persuasion

This research was completed for London AI Safety Research (LASR) Labs 2025 by Jennifer Za, Julija Bainiaskina, Nikita Ostrovsky and Tanush Chopra. The team was supervised by Victoria Krakovna (Google DeepMind). Find out more about the programme and express interest in upcoming iterations here. Introduction Many proposals for controlling misaligned...

Jul 25, 202518

Lessons from a year of university AI safety field building

This post is an organizational update from Georgia Tech’s AI Safety Initiative (AISI) and roughly represents our collective view. In this post, we share lessons & takes from the 2024-25 academic year, describe what we’ve done, and detail our plans for the next academic year. Introduction Hey, we’re organizers of...

Jun 6, 202533

LESSWRONG
LW

LESSWRONG
LW

Tuna

Tuna

Access to agent CoT makes monitors vulnerable to persuasion

Lessons from a year of university AI safety field building

Tuna

Tuna

Access to agent CoT makes monitors vulnerable to persuasion

Lessons from a year of university AI safety field building

Introduction