Boundaries-based security and AI safety approaches
[This part 3 of a 5-part sequence on security and cryptography areas relevant for AI safety. There is a more in-depth version of this post available here: Digital Defense] There is a long-standing computer security approach that has useful parallels to a recent strand of AI safety work. Both rely on the notion of ‘respecting boundaries’. Let's start with AI safety, then introduce the security approach, and finish with parallels. AI safety: Boundaries in The Open Agency Model and the Acausal Society In a recent LW post, The Open Agency Model, Eric Drexler expands on his previous CAIS work by introducing ‘open agencies’ as a model for AI safety. In contrast to the often proposed opaque or unitary agents, “agencies rely on generative models that produce diverse proposals, diverse critics that help select proposals, and diverse agents that implement proposed actions to accomplish tasks”, subject to ongoing review and revision. In An Open Agency Architecture for Safe Transformative AI, Davidad expands on Eric Drexler’s model, suggesting that, instead of optimizing, this model would ‘depessimize’ by reaching a world that has existential safety. So rather than a fully-fledged AGI-enforced optimization scenario that implements all principles CEV would endorse, this would be a more modest approach that relies on the notion of important boundaries (including those of human and AI entities) being respected. What could it mean to respect the boundaries of human and AI entities? In Acausal Normalcy, Andrew Critch also discusses the notion of respecting boundaries with respect to coordination in an acausal society. He thinks it’s possible that an acausal society generally holds values related to respecting boundaries. He defines ‘boundaries’ as the approximate causal separation of regions, either in physical spaces (such as spacetime) or abstract spaces (such as cyberspace). Respecting them intuitively means relying on the consent of the entity on the other side of the