Training on Documents About Monitoring Leads To CoT Obfuscation
Authors: Reilly Haskins*, Bilal Chughtai**, Joshua Engels** * primary contributor ** advice and mentorship Summary [Note: This is a research update sharing preliminary results as part of ongoing work] Will future models obfuscate their CoT when they learn during pretraining that their CoT is being monitored? We investigate this question...
Mar 1863