New paper: Corrigibility with Utility Preservation — LessWrong