LAThomson

Agentic Monitoring for AI Control

tl;dr - We present an introduction and initial investigation into agentic monitoring for AI control protocols. We explain the motivation behind giving monitors opportunities to be agentic, demonstrate and discuss some early experimental results, and recommend directions for future work, highlighting possible risks. Introduction AI control aims to limit catastrophic...

Oct 27, 202510

LAThomson

LAThomson

A Framework for Eval Awareness

Agentic Monitoring for AI Control

Towards shutdownable agents via stochastic choice

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

LAThomson

A Framework for Eval Awareness

Agentic Monitoring for AI Control

Towards shutdownable agents via stochastic choice

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

A Framework for Eval Awareness

Agentic Monitoring for AI Control

Towards shutdownable agents via stochastic choice

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models