Evaluating Collaborative AI Performance Subject to Sabotage — LessWrong