I'm a 19 years old university student who tries to be a good AI Safety researcher.
Thank you! Personally, I think that, if a layperson were trying to help me, they could do it by trying to find flaws in the plan. I already mentioned that the One-Time Pad used to fail during WWII in an unexpected way, despite the fact that it had a proof of perfect secrecy. If someone were to find a flaw in the plan, it would help me a lot (although it would also prove that my goal is impossible).
Thanks for the comment! I don't think you are interpreting this post wrong.
I don't think that running a superintelligence is ever useful. What I argue is that running it inside a cryptographic box is better than running it outside of the box. The goal of this project is that, if a superintelligence is run, then it doesn't automatically destroy the world. The goal isn't to run a superintelligence to do useful tasks.
Personally, I think that making it output 216 bits of information is too much. If the superintelligence were to output an ASCII text (which is very inefficient in order to unbox itself), then 216 bits would correspond to 8192 characters, which seems way more than enough in order to convince someone to unbox it. If the decrypted bits were thoroughly examined by people outside of the box, then I expect that n=256 would be too much. It would be interesting to know how much information we are able to communicate with that many bits (it makes me think about the Arecibo message, which uses 1679 bits).
Since I think that running a superintelligence is already too dangerous even without intending to decrypt anything, I wouldn't approve a team that tries to build a superintelligence with n=0. However, I think that planning n=0 is better than not trying to use a cryptographic box at all.
I think the objective should be to mitigate the risk of extinction by AI. Developing an unconditional proof of safety seems to be one way to reduce this risk, and this is why I decided to work on cryptographic boxes. However, we shouldn't rely only on trying to develop unconditional proofs, since these seem possible only when talking about control (how to prevent a superintelligence from achieving its bad objectives), but they seem nearly impossible to do when talking about alignment (how to give good objectives to a superintelligence). Moreover, working on alignment seems more promising than working on control in the cases where we have a lot of time until the first superintelligence.
I think that, if the box were to have holes, then we would realize it pretty soon from the fact that the superintelligence would dramatically change the world around us, and probably make Earth uninhabitable in this process.