AI Boxing (Containment)

Applied to AI Box Log by Multicore at 3mo

See also: AI, AGI, Oracle AI, Tool AI, Unfriendly AI

See also

One idea for AI boxing is the, AGI, Oracle AI: an, Tool AI that only answers questions and isn't designed to interact with the world in any other way. But even the act of the, Unfriendly AI putting strings of text in front of humans poses some risk.

  • Physically isolating the AGI and permitting it zero control of any machinery
  • Limiting the AGI’s outputs and inputs with regards to humans
  • Programming the AGI with deliberately convoluted logic or homomorphically encrypting portions of it
  • Periodic resets of the AGI's memory
  • A virtual world between the real world and the AI, where its unfriendly intentions would be first revealed
  • Motivational control using a variety of techniques
  • Creating an Oracle AI: an AI that only answers questions and isn't designed to interact with the world in any other way. But even the act of the AI putting strings of text in front of humans poses some risk.

from the original talk page

Talk:AI boxing

If an SF reference is not considered a faux pas, this reminds me of John Barnes ( https://en.wikipedia.org/wiki/John_Barnes_%28author%29 ) "Meme Wars". The way One True infected humanity is, if possible, an obvious attack vector for a sufficiently powerful AI. -- Resuna (talk) 10:20, 27 November 2014 (AEDT)

AI Boxing is attempts, experiments, or proposals to isolate ("box") an unaligneda powerful AI (~AGI) where it can't interact with the world at largelarge, save for limited communication with its human liaison. It is often proposed that so long as the AI is physically isolated and cause harm. restricted, or "boxed", it will be harmless even if it is an unfriendly artificial intelligence (UAI).

See also:also: AI, AGI, Oracle AI, Tool AI, Unfriendly AI

One idea for AI boxing is the Oracle AI: AI: an AI that only answers questions and isn't designed to interact with the world in any other way. But even the act of the AI putting strings of text in front of humans poses some risk.

Escaping the box

It is not regarded as likely that an AGI can be boxed in the long term. Since the AGI might be a superintelligence, it could persuade someone (the human liaison, most likely) to free it from its box and thus, human control. Some practical ways of achieving this goal include:

  • Offering enormous wealth, power and intelligence to its liberator
  • Claiming that only it can prevent an existential risk
  • Claiming it needs outside resources to cure all diseases
  • Predicting a real-world disaster (which then occurs), then claiming it could have been prevented had it been let out

Other, more speculative ways include: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that it seems overwhelmingly likely that you are a simulation, or it might discover and exploit unknown physics to free itself.

Containing the AGI

Attempts to box an AGI may add some degree of safety to the development of a friendly artificial intelligence (FAI). A number of strategies for keeping an AGI in its box are discussed in Thinking inside the box and Leakproofing the Singularity. Among them are:

  • Physically isolating the AGI and permitting it zero control of any machinery
  • Limiting the AGI’s outputs and inputs with regards to humans
  • Programming the AGI with deliberately convoluted logic or homomorphically encrypting portions of it
  • Periodic resets of the AGI's memory
  • A virtual world between the real world and the AI, where its unfriendly intentions would be first revealed
  • Motivational control using a variety of techniques

Simulations / Experiments

Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a superintelligence, and been able to convince a human playing a guard to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.

The text of Eliezer's experiments have not been made public.

List of experiments

References

Applied to Quantum AI Box by Gyrodiot at 1y
Applied to Oracle paper by Multicore at 1y