Auditing language models for hidden objectives — LessWrong