I've just posted an analysis to MIRI's blog called Transparency in Safety-Critical Systems. Its aim is to explain a common view about transparency and system reliability, and then open a dialogue about which parts of that view are wrong, or don't apply well to AGI.
The "common view" (not universal by any means) explained in the post is, roughly:
Black box testing can provide some confidence that a system will behave as intended, but if a system is built such that it is transparent to human inspection, then additional methods of reliability verification are available. Unfortunately, many of AI’s most useful methods are among its least transparent. Logic-based systems are typically more transparent than statistical methods, but statistical methods are more widely used. There are exceptions to this general rule, and some people are working to make statistical methods more transparent.
Three caveats / open problems listed at the end of the post are:
- How does the transparency of a method change with scale? A 200-rules logical AI might be more transparent than a 200-node Bayes net, but what if we’re comparing 100,000 rules vs. 100,000 nodes? At least we can query the Bayes net to ask “what it believes about X,” whereas we can’t necessarily do so with the logic-based system.
- Do the categories above really “carve reality at its joints” with respect to transparency? Does a system’s status as a logic-based system or a Bayes net reliably predict its transparency, given that in principle we can use either one to express a probabilistic model of the world?
- How much of a system’s transparency is “intrinsic” to the system, and how much of it depends on the quality of the user interface used to inspect it? How much of a “transparency boost” can different kinds of systems get from excellently designed user interfaces?
The MIRI blog has only recently begun to regularly host substantive, non-news content, so it doesn't get much commenting action yet. Thus, I figured I'd post here and try to start a dialogue. Comment away!
I wonder if "transparency" as an instrumental goal is a lost purpose. Presumably your original goal is "safety". One way to ensure safety is validation, which is made easier by transparency because "then additional methods of reliability verification are available". Only it does not help for sufficiently complex systems (your caveat 1), no matter how transparently designed they are. So maybe it's worth looking for other instrumental sub-goals toward the terminal goal of safety. A couple off the top of my head:
multi-layer design with built-in verification in each layer (cf. unit-testing/integration testing/black-box testing etc. in programming practice) to avoid having to validate all possible combinations of elementary inputs.
design for easy validation (for example, sorting a list is harder than verifying that it is sorted).
Isn't that what transparent formal verification is? Being aware of the structure of the system, prove that each component does what it's supposed to, then prove that the combination does what it's supposed to given that the components do. You have to have transparent access to the structure of the system and the behavior of the sub components to do this.
Is it? I have never seen a software system that is as easy or easier to verify than to build.
Re: formal verification, I wonder if FAI developers will ultimately end up having to resort to improved programming environments to reduce bug rates, as described by DJB:
HT Aaron Swartz. If this approach is taken, it probably makes sense to start developing the padded-wall programming environment with non-FAI test programming tasks well before the development of actual FAI begins, so more classes of bugs can be identified and guarded against. See also.
Such an environment would most likely be based on Haskell, if not something more esoteric.
Assuming performance is a concern, Haskell probably strikes the best balance between crazy-powerful type systems and compiler maturity.
As an anecdote, the exact string-escape issue you've mentioned is a (repeatedly) solved problem in that ecosystem, the type system being clever enough to escape most of the verbosity and non-genericity you'd get by trying the same thing in, say, C.
MIRI has mentioned (for example, in the 'Recommended Courses' post) the use of functional programming like Haskell in AI for proof-checking reasons
That assumes that the net contains a node corresponding exactly to what we mean by "X", that we know which node corresponds exactly to "X", and that we know how we know it. With logical rules we can at least have a formal proof that "X" in some model is equivalent to a particular term X in the logical language, and then ask "What can we prove from the logical rules about X?"
My intuition is that the ability to write down valid proofs of how a system behaves constitutes transparency, and the lack of that ability constitutes a black box.
Systems that are not amenable to formal proofs or that are have numerous edge cases will have less transparency. Tools for building succinct, orthogonal, modular, formalized systems will probably result in much more transparent systems. The most amazing tool in the world for training artificial neural networks will still produce a black box (unless it also happens to provide a formal model of the network's behavior in an accessible and meaningful format; in which case why even bother running the ANN?).
Bayes nets can answer many queries not corresponding to any one node, most famously of the form P(A|B).
I for one can't agree with the point that transparency does any good in security assessment if we consider implementation of a complex system (design has its own rules though). I believe you underestimate how broken a human mind really is.
Transparency == Priming
The team which does security review of the system will utterly fail the moment they get their hands on the source code, due to suggestion/priming effects.
If you truly consider removing all metadata from the code - the code looses half of its transparency already, so "transparency" doesn't quite apply. Any of that metadata will cause the security team to drop some testing effort. Those tests won't even be designed/written, not to mention "caring enough to make an actual effort". Otoh, if a program passes serious (more below) black-box testing, it doesn't need to pass anything else. Transparency is simply unnecessary.
Hardcore solution to security-critical systems:
Yes, it's paranoid. Systems can be split into smaller parts though - built and tested separately - so it's not as monumental an effort as it seems.
I think you are thinking about transparency differently than OP.
You seem to be thinking of informal code review type stuff (hence the comments and function names gripe), and not formal, mechanical verification, which is what OP is talking about (I think).
The point is that black box testing can only realistically verify a tiny slice of input-output space. You cannot prove theorems involving universal quantification, for example, without literally checking every input (which may not fit in the known universe). So if the system has some esoteric failure mode that you didn't manage to test for, you don't catch it.
On the other hand "transparent" testing is where you give eg a type checker access to the internal structure so it can immediately prove things like "nope, this function cannot match the spec, and will fail by adding a list to a number, when fed input X".
As a serious, if trivial, example, imagine black-box testing a quicksort. You test it on 1000 large random lists and measure the average and worst case running time. You probably get O(n*log(n)) for both. You deploy the code, and someone disassembles it, and designs a killer input and pwns your system, because quicksort has rare inputs for which it goes O(n^2).
Transparency isn't only about reading the source code or not, it's also about whether you can do formal deduction or not.
Thus the design (i.e. "The Math") vs implementation (i.e. "The Code") division. I believe design verification suffers from same problems as implementation verification, albeit maybe less severely (though I never worked with really complex, novel, abstract math... it would be interesting to see how many of those, on average, are "proved" correct and then blow up).
Still, I would argue that the problem is not that black-box testing is insufficient - it is where we are currently able to apply it - but rather that we have no idea how to properly black-box-test an abstract, novel, complex system!
PS. Your trivial example is also unfair and trivializes the technique. Black-box testing in no way implies randomizing all tests and I would expect the QuickSort to blow up very very soon in serious testing.
I'm curious as to whether you would prefer comments, critiques etc here or at the MIRI blog.
How sure are you that a transparent program can be powerful enough to be an FAI?