An issue with training schemers with supervised fine-tuning — LessWrong