I remember reading the argument in one of the sequence articles, but I'm not sure which one. The essential idea is that any such rules just become a problem to solve for the AI, so relying on a superintelligent, recursively self-improving machine to be unable to solve a problem is not a very good idea (unless the failsafe mechanism was provably impossible to solve reliably, I suppose. But here we're pitting human intelligence against superintelligence, and I, for one, wouldn't bet on the humans). The more robust approach seems to be to make the AI motivated to not want to do whatever the failsafe was designed to prevent it from doing in the first place, i.e. Friendliness.

Why not just write failsafe rules into the superintelligent machine?

by lukeprog 1 min read8th Mar 201181 comments


Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?