LESSWRONG
LW

AI Boxing (Containment)Conversations with AIsAI
Frontpage

2

[ Question ]

AI box question

by KvmanThinking
4th Dec 2024
1 min read
A
1
2

2

AI Boxing (Containment)Conversations with AIsAI
Frontpage

2

AI box question
7TsviBT
3Dagon
New Answer
New Comment

1 Answers sorted by
top scoring

TsviBT

Dec 04, 2024

70

In theory, possibly, but it's not clear how to save the world given such restricted access. See e.g. https://www.lesswrong.com/posts/NojipcrFFMzNx6Grc/sudo-s-shortform?commentId=onKfTrunn2Q2Gc4Pw

In practice no, because you can't deal with a superintelligence safely. E.g.

  • You can't build a computer system that's robust to auto-exfiltration. I mean, maybe you can, but you're taking on a whole bunch more cost, and also hoping you didn't screw up.
  • You can't develop this tech without other people stealing it and running it unsafely.
  • You can't develop this tech safely at all, because in order to develop it you have to do a lot more than just get a few outputs, you have to, like, debug your code and stuff.
  • And so forth. Mainly and so forth.
Add Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 7:15 AM
[-]Dagon9mo33

I'm not sure that AI boxing is a live debate anymore.  People are lining up to give full web access to current limited-but-unknown-capabilities implementations, and there's not much reason to believe there will be any attempt at constraining the use or reach of more advanced versions.

Reply
Moderation Log
More from KvmanThinking
View more
Curated and popular this week
A
1
1

I believe that even through a text-only terminal, a superintelligence could do anything to a human. Persuade the human to let it out, inflict extreme pleasure or suffering with a word. However, can't you just... limit the output of the superintelligence? Just make it so that the human can say anything, but the AI can only respond from a short list of responses, like "Yes," "No," "Maybe," "IDK," "The first option," "The second option," or a similar system. I... don't see how that can still present a risk. What about you?