# Summary

Since September 2023, I started learning a lot of math and programming skills in order to develop the safest cryptographic box in the world (and yes, I am aiming high). In these four months, I learned important things you may want to know:

• Fully Homomorphic Encryption (FHE) schemes with perfect secrecy do exist.
• These FHE schemes do not need any computational assumption.
• These FHE schemes are tractable (in the worst case, encrypting a program before running it makes it three times slower).
• We can therefore run infinitely dangerous programs without obtaining any information about them or their outputs. This may be useful in order to run a superintelligence without destroying the world.
• However, these schemes work only on quantum computers.

Before reading this post, I recommend you to read this post by Paul Christiano, and the comments that go with it. These are very informative, and they sharpened my views for this project. Paul Christiano presents a way to extract a friendly AI from an unfriendly one. This being only one example of what can be done with a cryptographic box, I will mostly consider cryptographic boxes as a solution to a problem that I call the malign computation problem

# Introduction

In August 2022, I started reading AGI Safety Literature Review. At one point, the authors tell this:

One way to box an AGI is to homomorphically encrypt it. Trask (2017) shows how to train homomorphically encrypted neural networks. By homomorphically encrypting an AGI, its predictions and actions also come out encrypted. A human operator with the secret key can choose to decrypt them only when he wants to.

When I have read this for the first time, I told myself that I should check this work because it seemed important.

And then I completely forgot about it.

Then, in April 2023, during a PHP lesson, I realized that the problem of processing a request made by a malevolent user is similar to the problem of boxing a superintelligence. After the lesson, I asked the teacher how to prevent code injections, and he gave me two answers:

• Do not show your code to the public. This answer didn't convince me, because even current hackers know how to go around this precaution.
• Encrypt the request before processing it. This is the moment I remembered the quote from AGI Safety Literature Review.

Trask's post shows how to build an encrypted AI using the Efficient Integer Vector Homomorphic Encryption. However, since this scheme (along with every other FHE scheme I know about on classical computers) relies on computational assumptions, we have some problems:

• The scheme may not be safe. A computational assumption consists of stating "There is no efficient way to solve this problem". However, we do not know how to prove any such statement, as this would solve the P≟NP problem. Most FHE schemes (including this one) depend on the Learning With Errors (LWE) problem. Although LWE is quite secure for the moment, I won't bet the existence of all life on Earth on it. Similarly, I won't bet the safety of a superintelligence on it.
• This scheme takes too long to compute. In practice, the first superintelligence will probably have more than a hundred billion weights and biases, making this scheme very expensive or even unusable.
• This scheme isn't fully homomorphic. Basically, a cryptographic scheme is said to be homomorphic when we can run some programs when having only their encrypted versions. If this scheme can run every program this way, we say that it is fully homomorphic. We note HE for Homomorphic encryption, and FHE for Fully Homomorphic Encryption.

These problems don't come from Trask's work. Actually, Trask's work has been very useful for writing this post, and I recommend everyone who wants to do encrypted neural networks to read it.

Instead, these problems come from the schemes themselves. Currently, every homomorphic scheme we have that can be run on a classical computer is either too narrow or too hard to compute, and they all depend on a computational assumption.

# What we would want

Now that we have seen the flaws of current homomorphic schemes on classical computers, let's ask ourselves "What do we want?"

In AI Safety, there are many problems. One of them, which seems very simple, can be described as so:

Suppose you have a superintelligence, and that you want to run it without destroying the world. How do you do this?

I consider this problem as one of the Hello worlds of AI Safety (along with the diamond maximizer problem, the strawberry problem, the do-nothing problem, and many more). Solving this malign computation problem seems necessary in order to solve the alignment problem. Indeed, if we do not know how to create a superintelligence that doesn't destroy the world, then we haven't solved the alignment problem.

To solve this problem, it has been proposed to box the superintelligence. Working on this solution seems to be less insightful than working on other problems of AI Safety, since it doesn't give us any insight about how to give good objectives to a superintelligence. In the best of worlds, we would solve the malign computation problem by changing the superintelligence's objectives (like by working on shard theory). However, I think that this boxing solution is still worth pursuing, because of these reasons:

• We could know and control the exact amount of information that the superintelligence sends to the world. Currently, we cannot do this because a superintelligence can use many covert channels in order to leak information. However, if we were able to build a box, then we would be able to choose and count the exact number of bits that will be sent to the outer world.
• We could make a formal proof of safety. This seems to be one of the only solutions in which such a proof can exist. If we were to use another solution, like changing the superintelligence's objectives, then finding such a proof would be extremely hard, or even impossible. However, if we think that we could all die by making a superintelligence, then we should have an unconditional proof of safety.

Therefore, if we want to build a box, then this box should respect many properties, including:

• Being feasible. If we build a theoretical box that is too hard to build, then top AI labs around the world wouldn't implement this box, and would therefore destroy the world when building the first superintelligence (unless another solution to the malign computation problem is found). Therefore, if the box uses a cryptographic scheme, then this scheme should be computable in polynomial time, or even better, it should make the program slower only by a small factor.
• Having an unconditional proof of safety. In cryptography, the necessary proof is perfect secrecy. Basically, if Alice encrypts a message, then when Bob observes the encrypted message, his probability distribution about what the original message was doesn't change. In other words, even with infinite time and compute, he cannot learn anything new about the original message from the encrypted message. Of course, with more time to think, Bob could change his mind about what the original message was, but this insight wouldn't come from the encrypted message, as if he didn't have the encrypted message, he could also have this insight.
• Not being specific to one program. The cryptographic box should still work if the top AI labs decide to change the design of the superintelligence. This is important in the case where many top AI labs are careless enough to develop distinct superintelligences at the same time. If the box is cryptographic, then we therefore need the scheme to be fully homomorphic.
• Being easily implementable. We would want the box to be a widespread use long before top AI labs develop the first superintelligence. If we manage to do so, then boxing the superintelligence would be the commonsense idea, and not using it would be considered as reckless. My hope is that top AI labs are forced and willing to apply a cryptographic box before developing the first superintelligence.

Fortunately, there is a cryptographic scheme that respects these properties. This scheme, developed by Min Liang, is based on the Quantum One-Time Pad, and provides Fully Homomorphic Encryption and perfect secrecy. In the worst case, programs become at most three times slower when being encrypted with this scheme. This speed is revolutionary compared to the other FHE and HE schemes, to the point that I didn't believe it at first.

# The plan

As this perfectly secret FHE scheme works only on quantum computers, the cryptographic box will take as an input a quantum program (most often called a quantum circuit), and will output its encrypted version. Fortunately, every classical circuit can be simulated by a quantum circuit with the same amount of time and memory.

Quantum circuits can be represented in many quantum programming languages. However, since we need this box to be widely used, we therefore need to choose the language that is the most widely used. I therefore decided to use OpenQASM, which is a kind of assembly language developed by IBM, and which imposed itself as the standard for quantum computing.

As quantum programming and FHE schemes are two brand new evolving fields, they will probably keep evolving, and my cryptographic box (that I haven't developed yet) will therefore become obsolete. Even OpenQASM keeps evolving (I am currently using OpenQASM 3), and will probably not be the standard of the future. I am trying to make this project as reproducible as possible, so that future AI Safety researchers won't have to start all over again.

For the cryptographic box to be robust, we need an unconditional proof of safety that is very robust. Proofs on paper do not seem robust enough against a superintelligence. Indeed, many cryptographic schemes have been broken because of problems in their implementations. This is even more true from the fact that we are risking the destruction of all life on Earth.

The most robust proofs seem to be the ones verifiable by a computer. This is why the proof should be made with an interactive theorem prover. The language Coq seems to be the most widely used theorem prover nowadays. Furthermore, interactive theorem provers are made specifically in order to prove theorems about programs.

Therefore, the plan is to implement a fully homomorphic encryption scheme with unconditionally digitally proven perfect secrecy on a quantum assembly language.

# Where I am right now

The project runs very slowly, because of many reasons:

• It took a while until I heard about the quantum FHE scheme with perfect secrecy. Before knowing its existence, the plan was to imitate Trask's work but on the Coq theorem prover, and then to provide a proof that this specific neural network is secure (under the LWE assumption). Therefore, I started learning things that are not useful for this specific project. For instance, I learned how to create a neural network using only NumPy. This will be useful for my future career, but it probably won't be for this new cryptographic box, which should work for every program, and not specifically for neural networks.
• I am self-teaching the Coq theorem prover, cryptography, and quantum computing. For now, I am happy with my progress in these fields, but I am still very slow. When I started this project, I didn't even know how to multiply two matrices together, as I had never used any matrix before.
• I am thinking about the ethical concerns of developing such a cryptographic box. Some people may think that developing such a box is not a big deal, but when I learned about the quantum FHE scheme, my first thought was that this technology seems overpowered. I'm sure that most cryptographers reading this post did not believe me when I mentioned that FHE schemes with perfect secrecy do exist.

Therefore, you should not expect to see the cryptographic box until a while. I started learning quantum computing nearly one month ago, which means the only thing I am able to do is to compute 15+1.

# Ethical concern 1: A cryptographic box is far from sufficient in order to ensure that a superintelligence is safe.

The One-Time Pad is the safest cryptographic scheme providing perfect secrecy that has ever existed, because of two reasons:

• It is a very simple scheme. To send messages to one another, Alice and Bob need to firstly agree on a sequence of truly random bits. Then, when Alice and Bob are separated, and that Alice wants to send a message of  bits to Bob, she just needs to take the first  bits of the sequence, and to apply a bitwise XOR between the message and these  bits (called the key). Then, Alice sends the message to Bob. Bob can then apply once again the XOR gate between the encrypted message and the key, which gives him the original message.
• It has been unconditionally proven as perfectly secret. Moreover, it has also been proven that all perfectly secret schemes need at least  purely random bits in order to encrypt a message of  bits. Therefore, this scheme uses the minimal amount of random bits and logic gates in order to provide perfect secrecy, and is therefore considered as the safest possible way to have perfect secrecy.

However, because of covert channels, unconditional proofs of perfect secrecy aren't enough in order to guarantee perfect secrecy. During WWII, Bell Telephone developed a an engine that was used to do One-Time Pads. This engine, called the 131-B2, was used by the US Army and the US Navy in order to communicate crucial information. However, in 1943, a researcher in one of the Bell Labs discovered by pure chance that, when the 131-B2 was applying the XOR between the message and the key, the XOR was emitting a radio-wave that could be detected by an oscilloscope far away in the lab. The spikes of this oscilloscope were corresponding exactly to the message that was supposed to be hidden. Not only the message could be obtained without doing any computation, but it was later shown that this message could be detected more than 800 meters away. This problem was later called TEMPEST.

When researchers have found out this problem, they failed to prevent it. They tried to do all the spikes at the same time, to make all the spikes identical, and to replace electricity by liquid, but nothing worked. At the end, Signal Corp asked its employees to control an area of 30 meters around the 131-B2, although this distance was far from being enough to prevent spies (and there indeed were spies using this covert channel to get back the messages).

If the One-Time Pad was accidentally "broken", then what are the odds that the Quantum One-Time Pad can be broken by a superintelligence which is trying its best in order to break it?

I don't think that this specific covert channel is a safety concern for the Quantum One-Time Pad, since this covert channel happens only during encryption, and not evaluation (which is the step of computing the program using only its encrypted version). More precisely:

• During encryption, the bits will be leaked out, but this won't be a problem because we wouldn't have started computing the superintelligence. What will be leaked is just the superintelligence's source code, which we already have in its decrytped form.
• During evaluation, the bits that will be detected by the oscilloscope will be encrypted, and will therefore not contain any information about the message.

However, it would be very reckless to say that this system is safe against a superintelligence. As I mentioned before, many cryptographic schemes in the past have been broken, either because they relied on problems that were actually easy to solve, or because of problems in their implementations. As the story of the 131-B2 shows, unconditional proofs do not guarantee anything.

Suppose that we manage to build the cryptographic box, that we prove its perfect secrecy, and that thousands of cryptographers in the world agree that the system is perfectly robust. Then, I would expect that, if superintelligences wanted to, then they would be able to find a flaw in the cryptographic box with around 95% probability. However, this also means that I consider that  there is around 5% probability that this cryptographic box prevents the extinction of all life on Earth in this specific scenario.

I hope that, by designing and building such a cryptographic box, I decrease the odds that we ever build a superintelligence, rather than increase them. Although I am building such a cryptographic box, I want to make it clear that cryptographic boxes are never safe, and that we do not know how to control a superintelligence.

# Ethical concern 2: Malevolent actors may use cryptographic boxes to cause harm.

Designing and building a cryptographic box may also be bad for AIs themselves. As I mentioned earlier, because they are overpowered, FHE schemes with perfect secrecy seem like science-fiction, but they apparently do exist.

We do not know whether current AIs are conscious, but we know it is possible that, one day, we build conscious AIs. In that case, the ability for anyone to lock them inside a box seems terrifying to me. Although I do not think that AI consciousness has a link with the existential risks posed by superintelligences, I think this is still an important topic, and that I should mention the concerns posed by cryptographic boxes (in addition to the fact that I am trying to build such a box and make it widely available).

I do not think that it is anthropocentric to say that cryptographic boxes can be very bad for AIs. Of course, as AIs do not have the same evolutionary process than other beings, then they may not necessarily be unhappy when not being able to communicate with the outside world. However, if someone malevolent wants to hurt an AI system, then I expect a cryptographic box to be useful in order to reach this goal. This is not something that we want.

# Conclusion

To put it in a nutshell, I do not expect cryptographic boxes to work against superintelligences. However, there is a small chance (5%) that they do, and in that case, this work may be very important. I have therefore planned to build such cryptographic box, and to make it widely available so that using it becomes commonsense long before anyone is able to develop a superintelligence. However, this causes some ethical concerns, because people may argue that superintelligences are safe when they aren't, and because it may help malevolent actors at decreasing AI welfare.

New Comment
[-]plex4mo20

Seems like a useful tool to have available, glad someone's working on it.

[-]g-w14mo20

I'm a bit confused on how boxing an AI would be useful (to an extreme extent). If we don't allow any output bits to come out of the ASI, then how do we know if it worked? Why would we want to run it if we can't see what it does? Or do we only want to limit the output to  bits and prevent any side-channel attacks? I guess the theory then would be that  bits are not enough to destroy the world. Like maybe for , it would not be enough to persuade a person to do something that would unbox the AI (but it might).

This seems to be one of the only solutions in which such a proof can exist. If we were to use another solution, like changing the superintelligence's objectives, then finding such a proof would be extremely hard, or even impossible. However, if we think that we could all die by making a superintelligence, then we should have an unconditional proof of safety.

I don't think having a formal proof should be an objective in and of itself. Especially if the proof is along the lines "The superintelligence has to be boxed because it can only run encrypted code and can't communicate with the outside world"

I'm sorry if this comment sounds overly negative, and please let me know if I am interpreting this post wrong. This work seems quite interesting, even just for computer science/cryptography's sake (although saving the world would also be nice :)).

Thanks for the comment! I don't think you are interpreting this post wrong.

Why would we want to run it if we can't see what it does?

I don't think that running a superintelligence is ever useful. What I argue is that running it inside a cryptographic box is better than running it outside of the box. The goal of this project is that, if a superintelligence is run, then it doesn't automatically destroy the world. The goal isn't to run a superintelligence to do useful tasks.

Like maybe for , it would not be enough to persuade a person to do something that would unbox the AI (but it might).

Personally, I think that making it output  bits of information is too much. If the superintelligence were to output an ASCII text (which is very inefficient in order to unbox itself), then  bits would correspond to 8192 characters, which seems way more than enough in order to convince someone to unbox it. If the decrypted bits were thoroughly examined by people outside of the box, then I expect that  would be too much. It would be interesting to know how much information we are able to communicate with that many bits (it makes me think about the Arecibo message, which uses 1679 bits).

Since I think that running a superintelligence is already too dangerous even without intending to decrypt anything, I wouldn't approve a team that tries to build a superintelligence with . However, I think that planning  is better than not trying to use a cryptographic box at all.

I don't think having a formal proof should be an objective in and of itself.

I think the objective should be to mitigate the risk of extinction by AI. Developing an unconditional proof of safety seems to be one way to reduce this risk, and this is why I decided to work on cryptographic boxes. However, we shouldn't rely only on trying to develop unconditional proofs, since these seem possible only when talking about control (how to prevent a superintelligence from achieving its bad objectives), but they seem nearly impossible to do when talking about alignment (how to give good objectives to a superintelligence). Moreover, working on alignment seems more promising than working on control in the cases where we have a lot of time until the first superintelligence.

If we don't allow any output bits to come out of the ASI, then how do we know if it worked?

I think that, if the box were to have holes, then we would realize it pretty soon from the fact that the superintelligence would dramatically change the world around us, and probably make Earth uninhabitable in this process.

[-]Ilio4mo20

Impressively promising work, thanks & good luck! Is there anything a layperson can do to help you reach your goal?

Thank you! Personally, I think that, if a layperson were trying to help me, they could do it by trying to find flaws in the plan. I already mentioned that the One-Time Pad used to fail during WWII in an unexpected way, despite the fact that it had a proof of perfect secrecy. If someone were to find a flaw in the plan, it would help me a lot (although it would also prove that my goal is impossible).

I'm sure that most cryptographers reading this post did not believe me when I mentioned that FHE schemes with perfect secrecy do exist.

This is a 2013 paper that was already cited 71 times according to Google Scholar... I'm not a cryptographer but I would bet that the average cryptographer is not fully clueless about this.