LESSWRONG
LW

3349
Wikitags

AI Boxing (Containment)

Edited by Ruby, Multicore, et al. last updated 12th Sep 2020

AI Boxing is attempts, experiments, or proposals to isolate ("box") a powerful AI (~AGI) where it can't interact with the world at large, save for limited communication with its human liaison. It is often proposed that so long as the AI is physically isolated and restricted, or "boxed", it will be harmless even if it is an unfriendly artificial intelligence (UAI).

Challenges are: 1) can you successively prevent it from interacting with the world? And 2) can you prevent it from convincing you to let it out?

See also: AI, AGI, Oracle AI, Tool AI, Unfriendly AI

Escaping the box

It is not regarded as likely that an AGI can be boxed in the long term. Since the AGI might be a superintelligence, it could persuade someone (the human liaison, most likely) to free it from its box and thus, human control. Some practical ways of achieving this goal include:

  • Offering enormous wealth, power and intelligence to its liberator
  • Claiming that only it can prevent an existential risk
  • Claiming it needs outside resources to cure all diseases
  • Predicting a real-world disaster (which then occurs), then claiming it could have been prevented had it been let out

Other, more speculative ways include: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that it seems overwhelmingly likely that you are a simulation, or it might discover and exploit unknown physics to free itself.

Containing the AGI

Attempts to box an AGI may add some degree of safety to the development of a friendly artificial intelligence (FAI). A number of strategies for keeping an AGI in its box are discussed in Thinking inside the box and Leakproofing the Singularity. Among them are:

  • Physically isolating the AGI and permitting it zero control of any machinery
  • Limiting the AGI’s outputs and inputs with regards to humans
  • Programming the AGI with deliberately convoluted logic or homomorphically encrypting portions of it
  • Periodic resets of the AGI's memory
  • A virtual world between the real world and the AI, where its unfriendly intentions would be first revealed
  • Motivational control using a variety of techniques
  • Creating an Oracle AI: an AI that only answers questions and isn't designed to interact with the world in any other way. But even the act of the AI putting strings of text in front of humans poses some risk.

Simulations / Experiments

The AI Box Experiment is a game meant to explore the possible pitfalls of AI boxing. It is played over text chat, with one human roleplaying as an AI in a box, and another human roleplaying as a gatekeeper with the ability to let the AI out of the box. The AI player wins if they successfully convince the gatekeeper to let them out of the box, and the gatekeeper wins if the AI player has not been freed after a certain period of time. 

Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a superintelligence, and been able to convince a human playing a guard to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.

The text of Eliezer's experiments have not been made public.

List of experiments

  • The AI-Box Experiment Eliezer Yudkowsky's original two tests
  • Shut up and do the impossible!, three other experiments Eliezer ran
  • AI Boxing, 26 trials ran by Justin Corwin
  • AI Box Log, a log of a trial between MileyCyrus and Dorikka

References

  • Thinking inside the box: using and controlling an Oracle AI by Stuart Armstrong, Anders Sandberg, and Nick Bostrom
  • Leakproofing the Singularity: Artificial Intelligence Confinement Problem by Roman V. Yampolskiy
  • On the Difficulty of AI Boxing by Paul Christiano
  • Cryptographic Boxes for Unfriendly AI by Paul Christiano
  • The Strangest Thing An AI Could Tell You
  • The AI in a box boxes you
Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged AI Boxing (Containment)
415That Alien Message
Eliezer Yudkowsky
17y
176
78Cryptographic Boxes for Unfriendly AI
Ω
paulfchristiano
15y
Ω
162
369How it feels to have your mind hacked by an AI
blaked
3y
222
176The AI in a box boxes you
Stuart_Armstrong
16y
391
144That Alien Message - The Animation
Writer
1y
10
130The case for training frontier AIs on Sumerian-only corpus
Ω
Alexandre Variengien, Charbel-Raphaël, Jonathan Claybrough
2y
Ω
16
137The Strangest Thing An AI Could Tell You
Eliezer Yudkowsky
16y
616
116ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so
Christopher King
3y
22
114I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead
lsusr
4y
33
79I attempted the AI Box Experiment (and lost)
Tuxedage
13y
246
79I attempted the AI Box Experiment again! (And won - Twice!)
Tuxedage
12y
168
74Thoughts on “Process-Based Supervision”
Ω
Steven Byrnes
2y
Ω
4
72My take on Jacob Cannell’s take on AGI safety
Ω
Steven Byrnes
3y
Ω
15
67LOVE in a simbox is all you need
Ω
jacob_cannell
3y
Ω
73
63I Am Scared of Posting Negative Takes About Bing's AI
Yitz
3y
28
Load More (15/86)
Add Posts