Most of my boundaries work so far has been focused on protecting boundaries "from the outside". For example, maybe davidad's OAA could produce some kind of boundary-defending global police AI.

But, imagine parenting a child and protecting them by keeping them inside all day. Seems kind of lame. Something else you could do, though, is not restrict the child and instead allow them to become stronger and better at defending themselves.

So: you can defend boundaries "from the outside", or you can empower those boundaries to be better at protecting themselves "from the inside". (Because, if everyone could defend themselves perfectly, then we wouldn't need AI safety, lol)

Defending boundaries "from the inside" has the advantage of encouraging individual agents/moral patients to be more autonomous and sovereign.  

I put some examples of what this might look like in Protecting agent boundaries:

Empower membranes to be better at self-defense

Empower the membranes of humans and other moral patients to be more resilient to collisions with threats. Examples:

  • Manipulation defense: You have an AI assistant that filters potentially-adversarial information for you.
  • Crime defense: Police have AI assistants that help them predict, deduce, investigate, and prevent crime.
  • Physical threat defense: (If nanotech works out) You have an AI assistant that shields you from physical threats.
  • Biological defense: Faster better vaccines, personal antibody printers, etc.
  • Cybersecurity defense: Good security practices and strong encryption. Software encryption can be arbitrarily strong. 
    • c.f. writing about this from Foresight Institute: (1)(2)(3)
  • Legal defense: personal AI assistants for e.g. interfacing with contracts and the legal system.
  • Bargaining: personal AI assistants for negotiation.
  • Human intelligence enhancement
  • Cyborgism 
  • Mark Miller and Allison Duettmann (Foresight Institute) outline more ideas in the form of “Active Shields” here: 7. DEFEND AGAINST PHYSICAL THREATS | Multipolar Active Shields. Cf Engines of Creation by Eric Drexler.
  • Related: We have to Upgrade – Jed McCaleb 

I'm looking to talk to people about the plausibility of empowering boundaries to be better at defending themselves / cyborgism. Let me know, or leave a comment if you know anyone who's thinking about this.

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

NicholasKees

Apr 03, 2024

30

Some thoughts:

First, it sounds like you might be interested the idea of d/acc from this Vitalik Buterin post, which advocates for building a "defense favoring" world. There are a lot of great examples of things we can do now to make the world more defense favoring, but when it comes to strongly superhuman AI I get the sense that things get a lot harder.

Second, there doesn't seem like a clear "boundaries good" or "boundaries bad" story to me. Keeping a boundary secure tends to impose some serious costs on the bandwidth of what can be shared across it. Pre-industrial Japan maintained a very strict boundary with the outside world to prevent foreign influence, and the cost was falling behind the rest of the world technologically.

My left and right hemispheres are able to work so well together because they don't have to spend resources protecting themselves from each other. Good cooperative thinking among people also relies on trust making it possible to loosen boundaries of thought. Weakening borders between countries can massively increase trade, and also relies on trust between the participant countries. The problem with AI is that we can't give it that level of trust, and so we need to build boundaries, but the ultimate cost seems to be that we eventually get left behind. Creating the perfect boundary that only lets in the good and never the bad, and doesn't incur a massive cost, seems like a really massive challenge and I'm not sure what that would look like. 

Finally, when I think of Cyborgism, I'm usually thinking of it in terms of taking control over the "cyborg period" of certain skills, or the period of time where human+AI teams still outperform either humans or AIs on their own. In this frame, if we reach a point where AIs broadly outperform human+AI teams, then baring some kind of coordination, humans won't have the power to protect themselves from all the non-human agency out there (and it's up to us to make good use of the cyborg period before then!) 

In that frame, I could see "protecting boundaries" intersecting with cyborgism, for example in that AI could help humans perform better oversight and guard against disempowerment around the end of some critical cyborg period. Developing a cyborgism that scales to strongly superhuman AI has both practical challenges (like the kind neuralink seeks to overcome), as well as requiring you to solve it's own particular version of alignment problem (e.g. how can you trust the AI you are merging with won't just eat your mind). 

 

Second, there doesn't seem like a clear "boundaries good" or "boundaries bad" story to me. Keeping a boundary secure tends to impose some serious costs on the bandwidth of what can be shared across it.

Hence "membranes", a way to pass things through in a controlled way rather than either allowing or disallowing everything. In this sense absence of a membrane is a degenerate special case of a membrane, so there is no tradeoff between presence and absence of boundaries/membranes, only between different possible membranes. If the other side of a membrane is... (read more)

1Chipmonk18d
yea
3 comments, sorted by Click to highlight new comments since: Today at 12:33 PM

I agree that this seems like a grouping of concepts around 'defensive empowerment' which feels like it gets at a useful way to think about reality. However, I don't know offhand of research groups with this general focus on the subject. I think mostly people focusing on any of these subareas have focused just on their specific specialty (e.g. cyberdefense or biological defense), or an even more specific subarea than that. 

I think one of the risks here is that a general agent able to help with this wide a set of things would almost certainly be capable of a lot of scary dual-use capabilities. That adds complications to how to pursue the general subject in a safe and beneficial way.

I can see how advancing those areas would empower membranes to be better at self-defense.

I'm having a hard time visualizing how explicitly adding concept, formalism, or implementation of membranes/boundaries would help advance those areas (and in turn help empower membranes more).

For example, is "what if we add membranes to loom" a question that typechecks? What would "add membranes" reify as in a case like that?

In the other direction, would there be a way to model a system's (stretch goal: human child's; mvp: a bargaining bot's?) membrane quantitatively somehow, in a way where you can before/after compare different interventions and estimate how well each does at empowering/protecting the membrane? Would it have a way of distinguishing amount-of-protection added from outside vs inside? Does "what if we add loom to membranes" compile?

right, yeah, i think precisely formalizing boundaries is less useful for the cyborgism angle