Sandbagging: How Models Use Reward-Hacking to Downplay Their True Capabilities — LessWrong