x
Finding Deception in Language Models — LessWrong