x
Training Agents to Self-Report Misbehavior — LessWrong