A possibility that occurs to me: If early automated researchers refuse to work on capabilities, then an irresponsible AI developer could use low stakes control to prevent them from sandbagging on capabilities work. Maybe the use of low stakes control to circumvent this refusal is a significant knock against further research developing low stakes control methods.
(As it stands, I don't take this point as a substantial update against low stakes control research because it doesn't seem likely enough that early automated researchers will in fact refuse to work on capabilities, bracketing the issue of whether or not they should.)
I enjoyed the story! How did you use Claude to help write this? I’m surprised that it was capable enough to meaningfully assist; maybe I should try using it for fiction writing uplift.