I suppose the problem comes when the AI starts to communicate with us. There would be a lot of information that they could exploit. Even if they don't get any sense of our physics, if they are able to model us we might be in trouble. And even if we didn't give them any direct communication (for example manifesting puzzles in their world, the solution of which would allow us to solve our own questions), they might promote simulation to a reasonable hypothesis.

EY wrote a story that serves as an intuition pump here.

Reply

[-]moridinamael7y20

I agree that there is practically no purpose to using this kind of method if you are just going to give the AI information about our reality anyway.

Reply

[-]rk7y30

It seems hard to me to get information out of the AI without also giving it information. That is, presumably we will configure parts of its environment to correspond to problems in our own world, which necessarily gives some information on our world.

I suppose another option would be that this is a proposal for running AGIs that just run without us ever getting information from. I don't think that's what you meant, but thought I'd check.

Reply

[-]Donald Hobson7y10

There are two potential obstacles such a system would present to an AI, the first is locating our world out of all the possibilities. Every bit of data the AI can access, be it its own code or the rules of the simulation or the amount of time you let it run is a clue. If the total amount of data exceeds th Komelgorov complexity of reality, then an AIXI would figure out where it was. The laws of quantum physics are quite simple, and we really don't know how much info is needed to point to a useful description of the world. This means that if our AI code is simple, and we are getting it to prove a theorem, this could be a serious barrier, but with an AI thats fed oodles of real world data, not so much.

The second barrier presented to an AI is in getting stuff done, once it knows that its in a sandbox, it would find weird tricks about making radio signals with the memory to be harder than if it had direct hardware access. Even if the software sandbox is flawless, it can work out a lot about its hardware and software just by knowing that they were optimized for efficient computation. This might give it enough understanding of how its implemented to try rowhammer.

Alternatively it can just encode whatever malicious payload it likes into the output channel. If you don't look at the output then its just a warm box. Either way, it looks less useful than holomorphic encryption.

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

12

Sandboxing by Physical Simulation?

12

12