x
Corrigibility Via Thought-Process Deference — LessWrong