x
Playing Dumb: Detecting Sandbagging in Frontier LLMs via Consistency Checks — LessWrong