Extending the Off-Switch Game: Toward a Robust Framework for AI Corrigibility
Introduction The off-switch game, first introduced by Stuart Armstrong in the context of AI safety, highlights a critical challenge in AI alignment: designing systems that are not only powerful and goal-driven but also corrigible—meaning they allow humans to intervene and shut them down without resistance. In this game, an AI...