Reward Fuse: Shutdown as a Reward Attractor...
Shutdown mechanisms are usually adversarial external interventions; past efforts have aimed at indifference and lack of resistance. But what if a shutdown mechanism is internally enacted by the system itself?
Mechanism sketch: once a verified tripwire flags a malignant state, the active reward regime switches so that shutdown becomes the highest-reward action.
The key difference: shutdown is not enforced or just tolerated—it becomes instrumentally optimal under the post-trigger reward landscape. A “kill switch” is resisted; maximal reward is pursued.
(Note: This proposal makes no claims on how malignancy is detected or how tripwires are secured; that remains outside the scope of this Quick Take. )
Question: Is there prior art that treats shutdown as a dominant policy under a verified failure mode?