What can go wrong with the following protocol for AI containment?

[-]HungryHobo10y70

There is such a thing as proving code but it's slow and very very expensive, on the order of hundreds of dollars per line of code, increasing with the complexity of the program.

Even with proven code, there have been cases where compiler optimizations have introduced security flaws.

So the odds are that intelligent entities living in a non-trivial simulation would eventually be able to find an exploitable flaw in your code and take over their own system or maybe just crash it.

You might also like this story: http://ttapress.com/553/crystal-nights-by-greg-egan/

You also might like Accelerando where it's implied that mega-lightyear-spanning civilizations (Kardashev level II or III) are attempting a side-channel timing attack on the virtual machine the universe is being run on.

With the specific example you give with merely near-human intelligence where you've got a concrete problem or set of problems you want the answer to the question. Not to chat with an AI afterwards.

So you could also have a number of Puzzle-walls with problems of increasing difficulty 1 to 100 where the "answer" is a general solution to a problem. The first 55 generate massive wealth when solved but the one you really want an answer to is 56, which when solved pauses/halts the simulation without warning and outputs the solution.

57 onward are only there so that nobody realizes that the universe will end when number 56 is solved.

Instead of 2 way communication you get a single answer.

2-way communication is far more risky.

[-]ZoltanBerrigomo10y00

Nice idea...I wrote an update to the post suggesting what seemed to me to be a variation on your suggestion.

About program checking: I agree completely. I'm not very informed about the state of the art, but it is very plausible that what we know right now is not yet up to task.

[-]HungryHobo10y00

I'm not sure it's just a matter of what we know right now, it's mathematically provable that you can't create a program which can find all security flaws or prove any provable code so bugs are pretty much inevitable no matter how advanced we become.

[-]ZoltanBerrigomo10y00

The theorem you cite (provided I understood you correctly) does not preclude the possibility of checking whether a program written in a certain pre-specified format will have bugs. Bugs here are defined to be certain undesirable properties (e.g., looping forever, entering certain enumerated states, etc).

Baby versions of such tools (which automatically check whether your program will have certain properties from inspecting the code) already exist.

[-]HungryHobo10y10

If the language and format you're using is Turing complete then you can't write a program which can guarantee to find all bugs.

If you limit yourself to a subset of features such that you are no longer writing in a format which is turing complete then you may be able to have a program capable of automatically proving that code reliably.

Static analysis code does exist but still doesn't guarantee 100% accuracy and is generally limited to the first couple of levels of abstraction.

Keep in mind that if you want to be 100% certain of no bugs you also have to prove the compiler, the checker, any code your program interacts with and the hardware on which the code runs.

[-]ZoltanBerrigomo10y00

If you limit yourself to a subset of features such that you are no longer writing in a format which is turing complete then you may be able to have a program capable of automatically proving that code reliably.

Right, that is what i meant.

[-]Richard_Kennaway10y40

Keep the AI in a box and don't interact with it.

The rest of your posting is about how to interact with it.

Don't have any conversations with it whatsoever.

Interaction is far broader than just conversation. If you can affect it and it can affect you, that's interaction. If you're going to have no interaction, you might as well not have created it; any method of getting answers from it about your questions is interacting with it. The moment it suspects what it going on, it can start trying to play you, to get out of the box.

I'm at a loss to imagine how they would take over the world.

This is a really bad argument for safety. It's what the scientist says of his creation in sci-fi B-movies, shortly before the monster/plague/AI/alien/nanogoo escapes.

[-]ZoltanBerrigomo10y00

These are good points. Perhaps I should not have said "interact" but chosen a different word instead. Still, its ability to play us is limited since (i) we will be examining the records of the world after it is dead (ii) it has no opportunity to learn anything about us.

Edit: we might even make it impossible for it to game us in the following way. All records of the simulated world are automatically deleted upon completion -- except for a specific prime factorization we want to know.

This is a really bad argument for safety.

You are right, of course. But you wrote that in response to what was a parenthetical remark on my part -- the real solution is to use program checking to make sure the laws of physics of the simulated world are never violated.

[-]Silver_Swift10y00

To be fair, all interactions described happen after the AI has been terminated, which does put up an additional barrier for the AI to get out of the box. It would have to convince you to restart it without being able to react to your responses (apart from those it could predict in advance) and then it still has to convince you to let it out of the box.

Obviously, putting up additional barriers isn't the way to go and this particular barrier is not as impenetrable for the AI as it might seem to a human, but still, it couldn't hurt.

[-]Jiro10y20

It is hard to argue, for example, that we (real human beings) have been harmed by being brought into existence in a universe without a God who is listening; almost all of us would prefer to be alive rather not. The same would go for them: surely, their simulated existence, imperfect as it is, is not worse than not having been brought into the world in the first place?

It is not that hard to argue this. This is the non-identity problem.

[-]Lukas_Gloor10y20

Ethically, I think one could justify all this. It is hard to argue, for example, that we (real human beings) have been harmed by being brought into existence in a universe without a God who is listening; almost all of us would prefer to be alive rather not. The same would go for them: surely, their simulated existence, imperfect as it is, is not worse than not having been brought into the world in the first place?

At least some of them will tell you they had rather not been born. But maybe you'll want to equip these orcs with an even stronger drive for existence, so they never choose death over life even if you torture them; would that make it more ok? I suspect not, so something with the "Do they complain to having been created?" approach seems flawed imo. Creating beings with a strong preference for existence would make it too easy to legitimize doing with them whatever you want.

How about imagining beings who at any moment are intrinsically indifferent to whether they exist or not? They only won't complain as long as they don't suffer. Perhaps that's too extreme as well, but if it's only simple/elegant rules you're looking for, this one seems more acceptable to me than the torture-bots above.

[-]ZoltanBerrigomo10y00

I guess I am willing to bite the bullet and say that, as long as entity X prefers existence to nonexistence, you have done it no harm by bringing it into being. I realize this generates a number of repulsive-sounding conclusions, e.g., it becomes ethical to create entities which will live, by our 21st century standards, horrific lives.

At least some of them will tell you they had rather not been born.

If one is willing to accept my reasoning above, I think one can take one more leap and say that statistically as long as the vast majority of these entities will prefer existing to never having been brought into being, we are in the clear.

[-]compartmentalization10y10

If you use the entities' preferences to decide what's ethical, then everything is (or can be), because you can just adjust their preferences accordingly.

[-]Manfred10y20

Even if the huge computing or algorithmic advances needed fell out of the sky tomorrow, this scheme still doesn't seem like it solves the problems we really want it to solve, because it does not allow the the agents to learn anything interesting about our world.

[-]Slider10y00

if I try to calculate 6 divided by 3 on a calculator and it answers "you need to exercise more" ahve I been served better because it answered a better question?

[-]Manfred10y00

Pretend that instead of "exercise more," the calculator gave you advice that was actually valuable. Then yes. Just because you expect it to be a calculator doesn't mean it can't be more valuable if it can do more valuable things.

[-]ZoltanBerrigomo10y00

When talking about dealing and (non)interacting with real AIs, one is always talking about a future world with significant technological advances relative to our world today.
If we can formulate something as a question about math, physics, chemistry, biology, then we can potentially attack it with this scheme. These are definitely problems we really want to solve.
Its true that if we allow AIs more knowledge and more access to our world, they could potentially help us more -- but of course the number of things that can go wrong has to increase as well. Perhaps a compromise which sacrifices some of the potential while decreasing the possibilities that can go wrong is better.

[-]Silver_Swift10y00

First off, I'm a bit skeptical about whether you can actually create a superintelligent AI by combining sped up humans like that, I don't think that is the core of your argument, though, so let's assume that you can and that the resultant society is effectively a superintelligence now.

The problem with superintelligences is that they are smarter than you. It will realize that it is in a box and that you are going to turn it off eventually. Given that this society is based on natural selection it will want to prevent that. How will it accomplish that? I don't know, I'm not a superintelligence and your scenario doesn't provide enough detail to figure out what you missed, but it is very, very difficult not to give a superintelligence any hints of how the physics of our world work. Maybe they notice minor fluctuations in the speed of the simulation based on environmental changes to the hardware or maybe they can reverse engineer things of our physiology from the way you wrote the simulation. That doesn't seem like much, but humans figured out an awful lot about (for instance) the events right after the big bang from seemingly absurdly tiny hints and we're just regular intelligences.

Even if they can't find a way out of their box in the time given to them they might try to convince you to run the simulation again with a longer running time. They could pretend to be unable to find the solution to prime factorization in time or they could convince you that the method they did find fails for very high primes (by making very subtle mistakes in its application). If it uses this method it also conveniently achieves a method of communicating with the outside world (through the investigators that inspect the dump after the simulation ends) and might be able to set up a better situation for itself the second time round.

[-]ZoltanBerrigomo10y10

is very, very difficult not to give a superintelligence any hints of how the physics of our world work.

I wrote a short update to the post which tries to answer this point.

Maybe they notice minor fluctuations in the speed of the simulation based on environmental changes to the hardware

I believe they should have no ability whatsoever to detect fluctuations in the speed of the simulation.

Consider how the world of world of warcraft appears to an orc inside the game. Can it tell the speed at which the hardware is running the game?

It can't. What it can do is compare the speed of different things: how fast does an apple fall from a tree vs how fast a bird flies across the sky.

The orc's inner perception of the flow of time is based on comparing these things (e.g., how fast does an apple fall) to how fast their simulated brains process information.

If everything is slowed down by a factor of 2 (so you, as a player, see everything twice is slow), nothing appears any different to a simulated being within the simulation.

[-]Silver_Swift10y10

You are absolutely correct, they wouldn't be able to detect fluctuations in processing speed (unless those fluctuations had an influence in, for instance, the rounding errors in floating point values).

About update 1: It knows our world very likely has something approximating newtonian mechanics, that is a lot of information by itself. but more than that, it knows that the real universe is capable of producing intelligent beings that chose this particular world to simulate. From a strictly theoretical point of view that is a crapton of information, I don't know if the AI would be able to figure out anything useful from it, but I wouldn't bet the future of humanity on it.

About update 2: That does work, provided that this is implemented correctly, but it only works for problems that can be automatically verified by non-AI algorithms.

[-]ZoltanBerrigomo10y00

but more than that, it knows that the real universe is capable of producing intelligent beings that chose this particular world to simulate.

Good point -- this undermines a lot of what I wrote in my update 1. For example, I have no idea if F = m d^3 x / dt would result in a world that is capable of producing intelligent beings.

I should at some point produce a version of the above post with this claim, and other questionable parenthetical remarks I made, deleted, or at least acknowledging that they require further argumentation; they are not necessary for the larger point, which is that as long as the only thing the superintelligence can do (by definition) is live in a simulated world governed by Newton's laws, and as long as we don't interact with it at all except to see an automatically verified answer to a preset question (e.g., factor "111000232342342"), there is nothing it can do to harm us.

[-]ZoltanBerrigomo10y00

I'm a bit skeptical about whether you can actually create a superintelligent AI by combining sped up humans like that,

Why not? You are pretty smart, and all you are is a combination of 10^11 or so very "dumb" neurons. Now imagine a "being" which is actually a very large number of human-level intelligences, all interacting...

[-]Silver_Swift10y00

Yeah, that didn't came out as clear as it was in my head. If you have access to a large number of suitable less intelligent entities there is no reason you couldn't combine them into a single, more intelligent entity. The problem I see is about the computational resources required to do so. Some back of the envelope math:

I vaguely remember reading that with current supercomputers we can simulate a cat brain at 1% speed, even if this isn't accurate (anymore) it's probably still a good enough place to start. You mention running the simulation for a million years simulated time, let's assume that we can let the simulation run for a year rather than seconds, that is still 8 orders of magnitude faster than the simulated cat.

But we're not interested in what a really fast cat can do, we need human level intelligence. According to a quick wiki search, a human brain contains about 100 times as many neurons as a cat brain. If we assume that this scales linearly (which it probably doesn't) that's another 2 orders of magnitude.

I don't know how many orcs you had in mind for this scenario, but let's assume a million (this is a lot less humans than it took in real life before mathematics took off, but presumably this world is more suited for mathematics to be invented), that is yet another 6 orders of magnitude of processing power that we need.

Putting it all together, we would need a computer that has at least 10^16 times more processing power than modern supercomputers. Granted, that doesn't take into account a number of simplifications that could be build into the system, but it also doesn't take into account the other parts of the simulated environment that require processing power. Now I don't doubt that computers are going to get faster in the future, but 10 quadrillion times faster? It seems to me that by the time we can do that, we should have figured out a better way to create AI.

[-]ZoltanBerrigomo10y00

Here is my attempt at a calculation. Disclaimer: this is based on googling. If you are actually knowledgeable in the subject, please step in and set me right.

There are 10^11 neurons in the human brain.

A neuron will fire about 200 times per second.

It should take a constant number of flops to decide whether a neuron will fire -- say 10 flops (no need to solve a differential equation, neural networks usually use some discrete heuristics for something like this)

I want a society of 10^6 orcs running for 10^6 years

As you suggest, lets let the simulation run for a year of real time (moving away at this point from my initial suggestion of 1 second). By my calculations, it seems that in order for this to happen we need a computer that does 2x10^25 flops per second.

According to this

http://www.datacenterknowledge.com/archives/2015/04/15/doe-taps-intel-cray-to-build-worlds-fastest-supercomputer/

...in 2018 we will have a supercomputer that does about 2x10^17 flops per second.

That means we need a computer that is one hundred million times faster than the best computer in 2018.

That is still quite a lot, of course. If Moore's law was ongoing, this would take ~40 years; but Moore's law is dying. Still, it is not outside the realm of possibility for, say, the next 100 years.

Edit: By the way, one does not need to literally implement what I suggested -- the scheme I suggested is in principle applicable whenever you have a superintelligence, regardless of how it was designed.

Indeed, if we somehow develop an above-human intelligence, rather than trying to make sure its goals are aligned with ours, we might instead let it loose within a simulated world, giving it a preference for continued survival. Just one superintelligence thinking about factoring for a few thousand simulated years would likely be enough to let us factor any number we want. We could even give it have in-simulation ways of modifying its own code.

[-]ZoltanBerrigomo10y00

I think this calculation too conservative. The reason is (as I understand it) that neurons are governed by various differential equations, and simulating them accurately is a pain in the ass. We should instead assume that deciding whether a neuron will fire will take a constant number of flops.

I'll write another comment which attempts to redo your calculation with different assumptions.

It seems to me that by the time we can do that, we should have figured out a better way to create AI.

But will we have figured a way to reap the gains of AI safely for humanity?

[-]ChristianKl10y00

I vaguely remember reading that with current supercomputers we can simulate a cat brain at 1% speed, even if this isn't accurate (anymore) it's probably still a good enough place to start.

The key question is what you consider to be a "simulation". The predictions such a model makes are far from the way a real cat brain works.

[-]Slider10y00

The ability of the simulated society to probe for outside reality goes up with simulated time not actual time passed. The phrasing of the text made it seem like comparison to human outside interventors.

Still inability to realise what you are doing seems rather dangerous. It is like "everythig is going to be fine if you just don't look down". Being reflective about what you are doing is often considered a virtue and not a vice.

[-]ZoltanBerrigomo10y00

Still inability to realise what you are doing seems rather dangerous.

So far, all I've done is post a question on lesswrong :)

More seriously, I do regret it if I appeared unaware of the potential danger. I am of course aware of the possibility that experiments with AI might destroy humanity. Think of my post above a suggesting a possible approach to investigate -- perhaps one with some kinks as written (that is why I'm asking a question here) but (I think) with the possibility of one day having rigorous safety guarantees.

[-]Slider10y00

I don't mean the writing of this post but in general the principle of trying to gain utility from minimising self-awareness.

Usually you don't make processes as opaque as possible to increase their possibility of going right. The opposite of atleast social political processes being transparent is seen as pretty important.

If we are going to create minilife just to calculate 42, seeing it get calculated should not be a super extra temptation. Preventing the "interrupt/tamper" decision by limiting options is rather backwards in doing that while it would be better to argue why it should not be chosen even if available.

LESSWRONG
LW

LESSWRONG
LW

0

What can go wrong with the following protocol for AI containment?

0

0