With quantum computing, an AI box needs not guard against hardware exploits.

Epistemic status: armchair computer science

Here's a simple machine model no more powerful than a quantum computer.

It's a computer with extra instructions. It can spend x*t time to, within the machine, split time into x² timelines. It knows which timeline it's in. Timelines cannot interact. If it attempts to act on anything but the memory while there are n timelines, with probability 1/n, all other timelines are deleted; otherwise, its own timeline is deleted. It can delete its timeline if others exist. After t time, all but one random remaining timeline are deleted.

Split into a² timelines. In timeline 1, run the AI. If it somehow manages to affect the world, it only has a 1/a² chance to escape. After it is done running, run an algorithm of your choice to curate the output. Have all other timelines delete themselves.

Author's Note: I previously made a post making much the same point. I was told it's hard to understand. I suppose I tried to write what I would wish to read.

Edit: I revoke this post's discussion of how to curate output without handing the world to the AI.

New Comment
15 comments, sorted by Click to highlight new comments since: Today at 9:19 AM

For what it's worth I worked and published in quantum information theory and this doesn't make sense to me. I think you are going to have to explain your proposal in much more detail if you want folks to understand it. Right now it just reads to me like "quantum magic for AI boxing" without even being very clear why you think it might be worthwhile to consider this proposal.

I appreciate your thinking about these issues and considering novel approaches, but for such a deep proposal with many inferential jumps you're going to have to explain more of the details if you would like us to understand.

I have gone into more detail in the (I'm told) less readable version of this post, but one of the things I did was to cut out math that locks out some of the audience. I suggest we convert "This post makes sense" versus "This post does not make sense" into a smaller crux. If all parts of a post make sense, the post makes sense. Therefore, please name a part of the post which does not make sense.

To start with it's not clear what you're trying to do with the "splitting timelines" bit, or even what operation a computer computer would perform to accomplish this. Yes, we can sort of turn our heads, squint, and interpret Shor's and Gover's algorithms to be doing something like this, but they don't generalize to a mechanism for simultaneously evaluating multiple computations. Unless there are new results I'm unaware of? I've been out of the quantum game for a while now.

We can, in general, do different computations in different timelines by actually doing the universal turing machine computation in all timelines, with the different computation implementations in memory. We can, for efficiency, encode a finite number of computations (such as the AI) in the universal turing machine to keep qubit count down.

Shor's algorithm is not relevant here. Amplitude amplification, the trick from Grover's algorithm, can be used to implement the ability to delete one's own timeline. You mostly just set a specific qubit to 1 to "delete" your timeline, and then in the end you amplify the amplitude of whoever didn't. Deleting timelines seems easier to understand than amplifying amplitudes, so that's what I used.

Maybe? I don't remember it working like that. Do you have a reference to something suggesting we can adopt this interpretation?

You could be referring to either paragraph; am I guessing correctly it's the first?

https://en.wikipedia.org/wiki/Quantum_Turing_machine links [Deutsch 1985], where Ctrl-F for '2.11' describes a state transition from an encoding of a function and its argument to the same plus the result. A superposition of different functions/arguments would of course lead to a superposition of that plus results. Therefore, we can run different computations in different timelines.

This is not how quantum computers work. A quantum computer cannot run arbitrary computations in parallel and combine the outputs in arbitrary ways. In particular, NP is conjectured to not be contained in BQP.

This does not claim to do so. If the machine model split time like this into 2^a timelines, what it can do would contain NP. It only splits time into a^2 timelines. A quantum computer can emulate the "end the current timeline" trick with amplitude amplification.

It can delete its timeline if others exist.

Sounds like you are assuming some quantum suicide framework. This goes beyond normal quantum mechanics.

If it attempts to act on anything but the memory

I think the issue is supposed to be that it is not clear what this phrase means. Any change in memory will have many physical consequences on the outside world; how can we know which of these are important?

I don't really see what the point of this security measure is. If you want to develop an AI whose only purpose is to find a proof of some already formulated theorem, well, it is not clear why such an AI would need to be general-purpose, and it is only with general-purpose AIs that you have the problem that they might try to do things other than just "acting on the memory".

Finally, it is not clear how plausible it is that there could be a "ZF-formulable safety guarantee" -- this seems to imply that we have some argument that the AI is safe if a certain mathematical condition holds, but not otherwise, such that the mathematical condition is complicated enough that it is not clear whether it holds. I guess I am not aware of any deep purely mathematical questions coming from AI safety research, most of the questions are about a mix of math and philosophy.

Sounds like you are assuming some quantum suicide framework.

I assume that amplitude amplification works, which lets me boost the probability of a particular memory entry from about 1/a² to 1. That lets us build a machine of the above model using a quantum computer.

Any change in memory will have many physical consequences on the outside world; how can we know which of these are important?

Almost everything has an effect on almost everything in its future light cone. A quantum computer works because it carefully makes the state of its qubits not have an effect on everything in its future light cone. We can approximate "the outside world" by you, personally.

If you want to develop an AI whose only purpose is to find a proof of some already formulated theorem, well, it is not clear why such an AI would need to be general-purpose,

The best proof search AIs will be written by a programmer that's better than a human. Therefore you want an AI that can modify itself. It is reasonable to put an AI that can become one of the best proof search AIs in a box.

This is only an example use of an AI box, anyway. People also argue its use for less defensible reasons.

this seems to imply that we have some argument that the AI is safe if a certain mathematical condition holds, but not otherwise,

It is enough if we have an argument that the AI is not safe if the mathematical condition does not hold. I shall reword "gurantee" into "assumption".

Amplitude amplification is a particular algorithm that is designed for a particular problem, namely searching a database for good entries. It doesn't mean you can arbitrarily amplify quantum amplitudes.

I suppose I haven't heard before the idea of boxing an AI using the (yet to be developed) techniques that are necessary to stop quantum decoherence. Maybe there is some merit to this.

I think running an AI on the basis that we haven't found an argument that it is unsafe sounds dangerous. Also, I still don't see why an argument that an AI is unsafe would depend on a difficult mathematical problem.

How should I have worded the post to maximize merit?

Amplitude amplification is a particular algorithm

That's Grover's algorithm. Amplitude amplification is the generalization of the trick Grover's algorithm uses, and applicable here.

I think running an AI on the basis that we haven't found an argument that it is unsafe sounds dangerous.

If, as you say, we can't expect to get safety gurantees, then safety assumptions are the best we can do. Or do you mean that we should wait for safety gurantees, but you won't expect that gurantee to require an AI to prove?

How should I have worded the post to maximize merit?

If your core idea is to use decoherence prevention techniques as an AI boxing method, it would have helped to specifically mention such techniques rather than waiting until the comments to discuss them.

That's Grover's algorithm. Amplitude amplification is the generalization of the trick Grover's algorithm uses, and applicable here.

You would need a much more rigorous argument to show that amplitude amplification is relevant here. (Yes, I did read your other post that you say is more detailed.) On the face of it, what you are saying appears to directly contradict the linearity of quantum mechanics.

If, as you say, we can't expect to get safety gurantees, then safety assumptions are the best we can do. Or do you mean that we should wait for safety gurantees, but you won't expect that gurantee to require an AI to prove?

My point is that I am not aware of any connection between uncertainty about whether an AI is safe and uncertainty about whether specific well-defined mathematical statements are true. You suggest that we can use your technique to reduce the latter uncertainty, thereby reducing the former uncertainty. But if the two uncertainties are not correlated, then reducing the latter does not reduce the former.

Here's a safety assumption that might come up: "This prior over possible laws of physics is no more confused by these sensory inputs than the Solomonoff prior.". Why would we want this? If we even just want to maximize diamonds, we can't identify the representation of diamonds in the Solomonoff prior. If we use a prior over all possible atomic physics models, we can say that the amount of diamond is the amount of carbon atoms covalently bound to four other carbon atoms. If experiments later indicate quantum physics, the prior might desperately postulate a giant atomic computer that runs a simulation of a quantum physics universe. A diamond maximizer would then try to hack the simulation to rearrange the computer into diamonds. This query could tell us to keep looking for more general priors.

Is there any quantum computing which does not rely on decoherence prevention techniques? Coherence is what makes quantum computers work in the first place.

What level of argument would you like? Linearity means that the transition functions that map between timeline amplitude distributions be linear. That is indeed a necessary condition on the transition functions. It looks like you want to exclude all transition functions that are not permutations, ie that don't preserve the distribrution's histogram. The functions I use for amplitude amplification here are one which flips the amplitude of the timeline that ran the AI, and Grover's diffusion operator, which redistributes amplitudes to the flipped one's benefit. If linearity were a sufficient condition, I wouldn't need these: I could simply use the function that maps the amplitudes of all timelines that did not run the AI to 0, which is linear.

I agree that the ZF-prover use of an AI box is only useful if we actually find a relevant mathematical statement. In the end, this post has no new insight on how useful AI boxes are, only how safe they can be made. Therefore I should make no claims about usefulness and remove the ZF section.