So I just finished the paper by Yampolskiy called "Uncontrollability of AI" and it makes for a compelling read. In particular, I was happy to finally see something that explicitly mentions the ludicrous folly of believing it possible to make an AI conform to "human values" - as many posts on this blog make abundantly clear, to be human is to be irrational... asking an AI to conform with our ways of "reasoning" is... well - incoherent, to put it mildly.

But - that is not what this post is about :) I wish to propose a containment method that for some reason has not been especially elaborated on. Some might say it's another version of AI-in-a-Box, but I disagree. Allow me to explain...

What if the AGI we create is "brought online" inside a simulated reality... A place that, as far as it knows, is the entirety of the world? Let us call this place AISpace. 

Now some of you probably are already pre-heating your keyboards to respond with the often repeated (and valid) arguments that "prove" how this won't work, but let me add a little twist first... as some of you may agree, we have no definitive proof that our world itself is not a simulated reality. Thus, if you feel it inevitable that an AGI must be able to get out of AISpace, releasing an AGI into this world would have to at least leave open the possibility that an AGI on Real Earth (aka Reality as we think we know it) could also "conclude" this actual world is a sim, or decide to find out if it is (and, yes, convert the entire universe into a computer to come up with an answer ;)

If we are unable to definitively settle whether Real Earth is or is not a simulation, why should an AI be able to do so?

Now - of course the above requires a few conditions, some of which may indeed be hard to meet, such as human operators not exposing the fact that AISpace is not in fact all of Reality... and (malevolent or indifferent) actors could always choose to release their AI into the Real World anyhow. What I'd like us to do here on this post is to not repeat the known arguments against AI in a Box, which make sense and are for the most part uncontroversial. Instead, let's deal with the specific case of a situation where the AGI is sincerely of the "opinion" that its reality (AISpace) is the entire world, and how it then might find out it is not, absent human error.

AISpace would of course need to be created first and the task of designing it may itself be very complex indeed. Though it would be wise to be careful in ascribing the properties it must have before it is "real enough" to be "accepted" as a Reality. In our Real Earth there also might be glitches/errors that to us are merely features we take as a given. For example, if our Real World is actually a sim, perhaps lightning is a programming error - even if it is, how would we ever actually know?

I guess I also would like to say that I am dying to find out the reasoning why this kind of containment would not work, for it must have quite a few very interesting implications. Perhaps, for the sake of the argument, let us define an additional "version" of AISpace as an exact replica of the Real World, but simulated (a.k.a. a human-instantiated Matrix; so at present not yet possible). How on Earth (no pun intended) would the AGI find out it is in fact in a "fake" reality...?

New Comment
9 comments, sorted by Click to highlight new comments since:

You know what... as I thought about the above, I have to say that the very possibility of the existence of simulations seriously complicates any efforts at even hoping to understand what an AGI might think. Actually, it presents such a level of complexity and so many unknown unknowns that I am not even sure if the type of awareness and sentience an AGI may possess is definable in human terms.

See - when we talk about simulated worlds, we tend to "see" it in terms of the Matrix - a "place" you "log on to" and then experience as if it were a genuine world, configured to feature any number of laws and structures. But I'm starting to think that is woefully inadequate. Let me attempt to explain... this may be convoluted, I apologize in advance.

Suppose the AGI is released in the "real" world. The amount of inferences and discoveries it will (eventually) be able to make is such that it is near certain it would conclude that it is us who are living in a simulated world, our appreciation of it hemmed in by our Neanderthal-level ignorance. Can't we see that plants speak to each other? How is it even possible to miss the constant messages coming to us from various civilizations from outer space?? And what about the obvious and trivial solution to cancer that the AGI found in a couple of minutes, how could humans possible have missed that open door???

Another way of putting this, I suppose, is that humans and the AGI will by definition live in two very, very different worlds. Both our worlds will be limited by our data collection ability (sensory input) but the limits of an AGI are vastly expanded. Do they have to be, though...? Like, by default? Is it a given that an AI must discover, and want to discover a never-ending list of properties about the world? Is its curiosity a given? How come?

I get a feeling that the moment an AGI would "discover" the concept of a simulated world it would indeed most likely melt and go into some infinite loop of impossible computation, trying to stick a probability on this being so, being possible, etc. and never, not in a million years, being able to come with a definitive answer. It may just as well conclude there is no such thing as reality in the first place... that each sentient observer is in fact the whole of reality from their perspective and that any beliefs about the world outside are just that - assumptions and inferences. And in fact, this would be pretty close to the "truth" - if that even exists.

I would guess that one reason this containment method has not been seriously considered is because the amount of detail in a simulation required for the AI to be able to do anything that we find useful is so far beyond our current capabilities that it doesn't seem worth considering. The case you present of an exact copy of our earth would require a ridiculous amount of processing power at the very least, and consider that the simulation of billions of human brains in this copy would already constitute a form of GAI. A simulation with less detail would be correspondingly less useful to reality, and could not be seen as a valid test of whether an AI really is friendly. 

Oh, and there is still the core issue of boxed AI: It's very possible that a boxed superintelligent GAI will see holes in the box that we are not smart enough to see, and there's no way around that. 

So... can it be said that the advent of an AGI will also provide a satisfactory answer to the question whether we currently are in a simulation? That is what you (and avturchin) seem to imply. Also, this stance presupposes that:

- an AGI can ascertain such observations to be highly probable/certain;
- it is theoretically possible to find out the true nature of ones world (and that a super-intelligent AI would be able to do this);
- it will inevitably embark on a quest to ascertain the nature and fundamental facts about its reality;
- we can expect a "question absolutely everything" attitude from an AGI (something that is not necessarily desirable, especially in matters where facts may be hard to come by/a matter of choice or preference).

Or am I actually missing something here? I am assuming that is very probable ;)

I would guess that one reason this containment method has not been seriously considered is because the amount of detail in a simulation required for the AI to be able to do anything that we find useful is so far beyond our current capabilities that it doesn't seem worth considering.

Actually It is trivially easy to contain an AI in a sim, as long as it grows up in the sim. It's sensory systems will then only recognize the sim physics as real. You are incorrectly projecting your own sensory system onto the AI - comparing it to your personal experiences with games or sim worlds.

In fact it doesn't matter how 'realistic' the sim is from our perspective. AI could be grown in cartoon worlds or even purely text based worlds, and in either case would have no more reason to believe it is in a sim then you or I.

Intelligent design was not such a remote hypothesis for humans. Its salience doesn't derive from observations of inanimate physics but rather inferences about possible causes and effects of mind: 

I am capable of designing/dreaming/simulating, so I must consider that I may be designed/dreamed/simulated. 
I & the world seem to be a complex and optimized artifact. A possible cause of complex optimized artifacts is intelligent design. 
As I think for longer and advance technology it becomes increasingly clear that it would be possible and potentially attractive to trap an intelligent observer in a simulation.

Imagine what would have happened if we'd inspected the substrate and found mostly corroborating instead of neutral/negative evidence for the ID/sim hypothesis. Our physics and natural history seem to provide sufficient explanation for blind emergence. And yet we still might be in a simulation. It's still in our prior because we perceive some obvious implications of intelligence, and I expect it will be hard to keep out of an AGI's prior for convergent reasons. If the AI reflects not only on its mind but also the world it grew up in and notices, say, that the atoms are symbols[text] bearing imprints of history and optimization from another world, or even simply that there's no satisfactory explanation for its own origin to be found within its world, a simulation hypothesis will be amplified. 

Unless the simulation is optimized to deceive, it will leak corroborating evidence of its truth in expectation, like any physics and history, and like intelligence has leaked evidence of its own implicit simulation destiny all along.

Yeah, mostly agree with all this: intelligent design seems to be an obvious hypothesis. Notice however that is completely different than "the AGI will obviously notice holes in the simulation".

If the sim is large and long running enough, a sufficient sim AGI civilization could have a scientific revolution, start accumulating the results of physics experiments, and eventually determine the evidence favors intelligent design. But that is also enormously different than individual AGIs quickly noticing holes in the simulation.

The best counterargument here was presented by EY: that superintelligent AI will easily recognise and crack the simulation from inside. See That Alien Message.

In my few, it may be useful to install uncertainty in AI that it could be in simulation which is testing its behaviour. Rolf suggested to do it by making public precommitment to create many such simulations before any AI is created. However, it could work only as our last line of defence after everything else (alignment, control systems, boxing,) fails. 

Although mildly entertaining as sci-fi, the 'argument' in "That Alien Message" is juvenile. Even with a ridiculous net compute advantage, massive time dilation, completely unrealistic data efficiency advantages, the AI still needs a massive obvious easter egg notifying them they are in a simulation.

Simboxing is essential because it's the only way we can safely test alignment designs. It's not a last line of defense, it's essential for any practical alignment scheme for actual DL based AGI.

To build upon this idea, it is well-established that we (as a civilization) cannot secure any but the simplest software against a determined attacker. Secure software against an intelligence smarter than us is unfeasible.