Information bottleneck for counterfactual corrigibility — LessWrong