[paper] Training on Documents About Monitoring Leads to CoT Obfuscation
Authors: Reilly Haskins*, Bilal Chughtai**, Joshua Engels** * primary contributor ** advice and mentorship This is the updated version of our earlier preliminary results post, covering the final results from our paper. The paper extends our preliminary work to eight models, a harder agentic task, CoT controllability analysis, and RL...
May 2724