x

LESSWRONG

LW

OwenChen — LessWrong

OwenChen

OwenChen

Message

3

1

2y

OwenChen

3

2y

Extending the Off-Switch Game: Toward a Robust Framework for AI Corrigibility

Introduction The off-switch game, first introduced by Stuart Armstrong in the context of AI safety, highlights a critical challenge in AI alignment: designing systems that are not only powerful and goal-driven but also corrigible—meaning they allow humans to intervene and shut them down without resistance. In this game, an AI...

Sep 25, 2024•4