Training Deliberative Monitors for Black-Box Scheming Detection
Paper: https://arxiv.org/abs/2605.29601 Thread: https://x.com/aksh_n0/status/2062568855814193497 TL;DR: Training small open-weight monitors provides a cost-effective alternative to prompted frontier monitors. Applying our training recipe to Qwen3.5-27B results in a monitor better at scheming detection than all smaller prompted monitors (Gemini 3.1 Flash-Lite, GPT-5.4 Nano, Claude Haiku 4.5) and Gemini 2.5 Pro, while achieving...