Improvement on MIRI's Corrigibility — LessWrong