Research note on selective inoculation
Introduction Inoculation Prompting is a technique to improve test-time alignment by introducing a contextual cue (like a system prompt) to steer the model behavior away from unwanted traits at inference time. Prior inoculation prompting works apply the inoculation prompt globally to every training example during SFT or RL, primarily in...
Apr 514