Corrigibility

Corrigibility is an AI system's capacity to be safely and reliably modified, corrected, or shut down by humans after deployment, even if doing so conflicts with its current objectives.

Applied to 1. The CAST Strategy by Max Harms ago