Blind deep-deployment evals for control & sabotage
Thanks to Ezra Newman for initial ideation and various people at Apollo Research for feedback. This short personal piece does not necessarily reflect the views of Apollo Research. AI labs are preparing to automate their internal staff over the next year. Right now, control and sabotage evals try to estimate...