12 Friendly, but Dumb: Why formal Friendliness proofs may not be as safe as they appear

by apophenia

19th Apr 2010

3 min read

4

12

Personal Blog

12

New Comment

4 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:16 AM

[-]PhilGoetz16y50

The argument centers around the fallibility of some (naive) formal proofs of Friendliness which I've seen people discussing the AI box problem willing to accept.

This post might make more sense to me if you presented these proofs.

Reply

[-]Unnamed16y30

How long will it take Betty to FOOM? She may start out dumb enough to unwittingly do Abe's bidding, but if we've reached the stage where Abe and Betty exist, I'd expect that either 1) Betty will rapidly become smart enough to fix her design and avoid furthering Abe's unfriendly goals or 2) Betty will visibly be much less intelligent than Abe (e.g., she'll be incapable of creating an AI, as Abe has created her).

Reply

[-]apophenia16y00

Well, I had certainly imagined Betty as less intelligent than Abe when writing this, but you make a good point with (1). If Betty is still programmed to make herself more intelligent, she'd have to start out dumb, probably dumb enough for humans to notice.

To give a concrete example for the chess program: The Turk could change itself to use a linear function of the number of pieces, with one-move lookahead. No matter how much this function is optimized, it's never going to compare with lookahead.

On the other hand, there's no particular reason Betty should continue to self-improve that I can see.

Reply

[-]JGWeissman16y70

On the other hand, there's no particular reason Betty should continue to self-improve that I can see.

A sub goal that is useful in achieving many primary goals is to improve one's general goal achieving ability.

An attempt to cripple a FAI by limiting its general intelligence would be noticed, because the humans would expect it to FOOM, and if it actually does FOOM it will be smart enough.

A sneakier unfriendly AI might try to design an FAI with a stupid prior, with blind spots the uFAI can exploit. So you would want your Friendliness test to look at not just the goal system, but every module of the supposed FAI, including epistemology and decision theory.

But, even a thorough test does not make it a good idea to run a supposed FAI designed by an uFAI. This allows the uFAI to optimize for its purpose every bit of uncertainty we have about the supposed FAI.

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

12

Friendly, but Dumb: Why formal Friendliness proofs may not be as safe as they appear

12

12