Basic Legibility Protocols Improve Trusted Monitoring
This is a blog post for our paper Basic Legibility Protocols Improve Trusted Monitoring. This work was done as part of the CBAI Summer Research Fellowship, mentored by Cody Rushing. In this blog post, we’ll go over the main results, and focus more on the engineering and methodological takeaways than...
Feb 128