x
Toward Corrigibility: Interrogating AGI’s Instrumentally Convergent Preferences via Existential Threat Draft — LessWrong