Protecting against sudden capability jumps during training — LessWrong