AGI Chaining

Chaining God is Stuart Armstrong's term for his proposed method of maintaining control over a superhuman AGI. It involves a chain of AGIs, each more advanced than the next. The idea is that even though humans might not be able to understand the most sophisticated AGI well enough to trust it, they can understand and trust the first AGI in the chain, which will in turn verify the trustworthiness of the next AGI, and so on.

Armstrong mentions a number of considerations:

If an AGI at any level ever claims or is claimed to be untrustworthy, the chain should be instructed to gather diagnostic information, then start from scratch.
If the AGI chain passes integrity checks yet acts untrustworthy, restart from scratch.
If the AGI chain refuses to and can prevent us from shutting it down, we are in trouble and can only attempt to negotiate with it, and hope for the best.
If after repeated attempts the chain continues to fail, or a level of intelligence is reached that claims the chain slows it down too much for further progress, at least some safe research has been conducted. We may choose to accept that limitation, or to simply accept an untrustworthy AGI.
If the AGI chain breaks invisibly, we're probably doomed.

This is a very conservative approach to AGI design, and presents a large opportunity cost. Armstrong believes the chain approach would be unlikely to produce anywhere near the best possible future, since the AGI chain would only learn from present human values. Each improved layer of AGI would be limited in improvement to ensure its creator could understand it. With supervision happening at each level, an AGI would take longer to develop and when starting over repeatedly the seed AI would always have to be humanly comprehensible. He believe an AGI chain is a simple way to create Friendly Artificial Intelligence, but enumerates a number of ways the concept might never work.

References

Armstrong, Stuart (2007) Chaining God: A qualitative approach to AI, trust and moral systems
A review of proposals toward safe AI by Joshua Fox

LESSWRONGTags
LW

See also

References