I've just posted an analysis to MIRI's blog called Transparency in Safety-Critical Systems. Its aim is to explain a common view about transparency and system reliability, and then open a dialogue about which parts of that view are wrong, or don't apply well to AGI.
The "common view" (not universal by any means) explained in the post is, roughly:
Black box testing can provide some confidence that a system will behave as intended, but if a system is built such that it is transparent to human inspection, then additional methods of reliability verification are available. Unfortunately, many of AI’s most useful methods are among its least transparent. Logic-based systems are typically more transparent than statistical methods, but statistical methods are more widely used. There are exceptions to this general rule, and some people are working to make statistical methods more transparent.
Three caveats / open problems listed at the end of the post are:
- How does the transparency of a method change with scale? A 200-rules logical AI might be more transparent than a 200-node Bayes net, but what if we’re comparing 100,000 rules vs. 100,000 nodes? At least we can query the Bayes net to ask “what it believes about X,” whereas we can’t necessarily do so with the logic-based system.
- Do the categories above really “carve reality at its joints” with respect to transparency? Does a system’s status as a logic-based system or a Bayes net reliably predict its transparency, given that in principle we can use either one to express a probabilistic model of the world?
- How much of a system’s transparency is “intrinsic” to the system, and how much of it depends on the quality of the user interface used to inspect it? How much of a “transparency boost” can different kinds of systems get from excellently designed user interfaces?
The MIRI blog has only recently begun to regularly host substantive, non-news content, so it doesn't get much commenting action yet. Thus, I figured I'd post here and try to start a dialogue. Comment away!