LESSWRONG
LW

Boundaries (membranes) for AI safety by Chipmonk
Boundaries / Membranes [technical]Computer Security & CryptographyCyborgismIntelligence AmplificationSecurity MindsetTool AIAI
Frontpage

11

Protecting agent boundaries

by Chris Lakin
25th Jan 2024
2 min read
6

11

Boundaries / Membranes [technical]Computer Security & CryptographyCyborgismIntelligence AmplificationSecurity MindsetTool AIAI
Frontpage

11

Previous:
Agent membranes and causal distance
3 comments20 karma
Next:
Boundary Violations vs Boundary Dissolution
4 comments8 karma
Log in to save where you left off
Protecting agent boundaries
3the gears to ascension
1Chris Lakin
1Chris Lakin
2the gears to ascension
1VojtaKovarik
2Chris Lakin
New Comment
6 comments, sorted by
top scoring
Click to highlight new comments since: Today at 8:06 PM
[-]the gears to ascension2y30

solid, but I still think you're missing structure that makes this approach less effective than it seems on the face:

in full generality, what's a "threat"?

in full generality, what's a "dangerous" collision?

I worry that the current failure mode of attempting to empower in order to defend is that the defense is actually used to strike inside another's boundary, as has been the case for ~all weapons

Reply
[-]Chris Lakin2y10

in full generality, what's a "threat"?

in full generality, what's a "dangerous" collision?

Hm I'm not immediately sure how to define these

Reply
[-]Chris Lakin2y10

is that the defense is actually used to strike inside another's boundary, as has been the case for ~all weapons

Yeah, I am worried about this. 

This is notably not the case for infosec and encryption, where defensive capability doesn't imply offensive capability. However, I'm unsure if this is also true for any physical interventions. (e.g.:  Vaccines? No, bioweapons… Nanotech? No…)

That said, physical interventions do seem to be defense-dominant when there is coordination among a sufficiently large portion of society/power.

Reply
[-]the gears to ascension2y20

I don't think I'm convinced physical interactions are defense dominant. The easiest-to-formally-certify defense is to enclose something in a hunk of impenetrable matter, and that only can be certified up to a given impact energy level. Above that energy level, the defense will simply be stripped away. Only MAD seems able to be game theoretically durable, and certifying that a MAD situation will endure requires proving through a simulation of the opposition.

Reply
[-]VojtaKovarik2y10

Might be obvious, but perhaps seems worth noting anyway: Ensuring that our boundaries are respected is, at least with a straightforward understanding of "boundaries", not sufficient for being safe.
For example:

  • If I take away all food from your local supermarkets (etc etc), you will die of starvation --- but I haven't done anything with your boundaries.
  • On a higher level, you can wipe out humanity without messing with our boundaries, by blocking out the sun.
Reply
[-]Chris Lakin2y20

Yes, see Agent membranes/boundaries and formalizing “safety” and davidad's comment. 

(Also, I'm not necessarily agreeing that your examples are not violations of boundaries. First one isn't a violation of end-person (although probably the farmer). Second one could be.)

Reply
Moderation Log
More from Chris Lakin
View more
Curated and popular this week
6Comments

If the preservation of an agent's boundary is necessary for that agent's safety, how can that boundary/membrane be protected?

How agent boundaries get violated

In order to protect boundaries, we must first understand how they get violated.

Let’s say there’s a cat, and it gets stabbed by a sword. That’s a boundary violation (a.k.a. membrane piercing). In order for that to have happened, three conditions must have been met:

  1. There was a sword.
  2. The cat and the sword collided.
  3. The cat wasn’t strong enough to resist penetration from the sword.

More generally, in order for any existing membrane to be pierced, three conditions must have all been met:

  1. There was a potential threat. (E.g., a sword, or a person with a sword.)
  2. The moral patient and the threat collided.
  3. The victim failed to adequately defend itself. (Because if the cat was better at self-defense — if its skin was thicker or if it was able to dodge — then it would not have been successfully stabbed.)

Protecting agent boundaries

Each of these three conditions then implies ways of preventing boundary violations (a.k.a. membrane piercing):

1. There was a potential threat.

  • → Minimize potential threats

2. There was a collision.

  • → Minimize dangerous collisions
    • → Predict and prevent collisions before they occur.
    • → Prevent collisions by putting distance between threats and moral patients.
    • → Prevent premeditated collisions by pre-committing to retribution. 

3. The victim failed to defend itself.

  • → Empower the membranes of humans and other moral patients to be better at self-defense.

How human societies already try to solve this problem

As a helpful analogy, here’s some examples of how modern human societies try to solve this problem:

Minimize potential threats

  • Restrict access to weapons (e.g., nukes, bioweapons, etc.) 
  • Minimize potential perpetrators (i.e., e.g., some fictional societies predict and eliminate potential psychopaths).

Minimize dangerous collisions

  • Protect high-risk individuals, e.g. put them witness protection
  • Prevent collisions before they occur, e.g. predictive policing, traffic lights.
  • Police crimes after they occur.

Empower membranes to be better at self-defense

  • Infosec defense: Use good security practices and strong encryption.
  • Biological defense: Develop and use beneficial vaccines.
  • Manipulation defense: Reduce unhelpful cognitive biases and emotional insecurities.

How this applies to AI safety:

Minimize potential AI threats

(this is obvious/boring so I'm omitting it) 

Minimize dangerous AI collisions

(this is obvious/boring so I'm omitting it) 

Empower membranes to be better at self-defense

Empower the membranes of humans and other moral patients to be more resilient to collisions with threats. Examples:

  • Manipulation defense: You have an AI assistant that filters potentially-adversarial information for you.
  • Crime defense: Police have AI assistants that help them predict, deduce, investigate, and prevent crime.
  • Physical threat defense: (If nanotech works out) You have an AI assistant that shields you from physical threats.
  • Biological defense: Faster better vaccines, personal antibody printers, etc.
  • Cybersecurity defense: Good security practices and strong encryption. Software encryption can be arbitrarily strong. 
    • c.f. writing about this from Foresight Institute: (1), (2), (3)…
  • Legal defense: personal AI assistants for e.g. interfacing with contracts and the legal system.
  • Bargaining: personal AI assistants for negotiation.
  • Human intelligence enhancement
  • Cyborgism 
  • Mark Miller and Allison Duettmann (Foresight Institute) outline more ideas in the form of “Active Shields” here: 7. DEFEND AGAINST PHYSICAL THREATS | Multipolar Active Shields. Cf Engines of Creation by Eric Drexler.
  • Related: We have to Upgrade – Jed McCaleb 
Mentioned in
26Agent membranes/boundaries and formalizing “safety”
10Plausibility of cyborgism for protecting boundaries?