x
Adding noise to a sandbagging model can reveal its true capabilities — LessWrong