x
Automated Interpretability-Driven Model Auditing and Control: A Research Agenda — LessWrong