Hi, thank you for your work! I was curious if you considered using BashArena as a control setting. Since BashArena includes a wider range of main tasks and sabotage tasks, how do you think that might impact the results of the study? If the red team were to submit fewer overall successful sabotaged trajectories (because BashArena makes sabotages more difficult), would that change how successfully the monitor can collude with the red team?
Hi, thank you for your work! I was curious if you considered using BashArena as a control setting. Since BashArena includes a wider range of main tasks and sabotage tasks, how do you think that might impact the results of the study? If the red team were to submit fewer overall successful sabotaged trajectories (because BashArena makes sabotages more difficult), would that change how successfully the monitor can collude with the red team?