[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations — LessWrong