x
Eliciting secret knowledge from language models — LessWrong