Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior — LessWrong