x
AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors — LessWrong