AI Transparency: Why it’s critical and how to obtain it.
AI Transparency Why it’s critical and how to obtain it Claim #1 No amount of thought experiments on AI safety will be sufficient to be confident in the safety of a non-transparent AI. There is almost certainly a vector of attack or mode of failure which no one has imagined which will be exploitable by a sufficiently intelligent system. There are far too many ways things can go wrong. Claim #2 Neural network based AI can not be proven to be safe except by exhaustive search over the input space. Since exhaustive search over input space is intractable, the safety of a NN based AI is intractable. As a thought experiment, consider a single weight can be the difference between a safe and an unsafe system, finding that weight and determining its safe ranges will require exhaustive search. It seems very unlikely that a mathematics or computation will be developed to solve this problem before we solve AGI. Claim #3: Figuring out how to build an aligned system from a black box is far more challenging than figuring out how to build an aligned system from an interpretable, transparent box. This is true because a transparent box is safely debuggable and testable assuming it is designed such that Claim #4 holds true. Claim #4 If you could read another person's (not AI) mind faster than they can think and disable them at the press of a button it would be impossible for that person to intentionally harm you without your consent. For as soon as the person starts thinking about harming you or tricking you… you can press the button and stop them. This holds true even if you are much less intelligent than the other person. Claim #5 Building a transparent AI such that it thinks like a human and Claim #4 holds true is not all that difficult. The first three claims are my personal intuitions and may very well turn out to be incorrect, but they seem quite grounded, and nothing exceptional is being claimed. Claims four and five are, in my opinion, the least obviou