Let's take for granted that pursuing FAI is the best strategy for researchers interested in the future of all humanity.  However, let's also assume that controlling unfriendly AI is not completely impossible.  I would like to see arguments on why FAI may or may not be the best strategy for AGI researchers who are solely interested in selfish values: i.e., personal status, curiosity, well-being of their loved ones, etc.

I believe such discussion is important because i) all researchers are to some extent selfish and ii) it may be unwise to ignore researchers who fail to commit to perfect altruism.  I, myself, do not know how selfish I would be if I were to become an AGI researcher in the future.

 

EDIT: Moved some of the original post content to a comment, since I suspect it was distracting from my main point.

New Comment
12 comments, sorted by Click to highlight new comments since: Today at 8:12 PM

the easier it is to develop FAI in comparison to unfriendly AI

What? What are your definitions of FAI and unfriendly AI?

Rather than unfriendly AI, I think he means a Friendly AI that's only Friendly to one person (or very few people). If we're going to be talking about this concept then we need a better term for it. My inner nerd prefers Suzumiya AI.

Conceptually there is very little difference between AGI which understands the values of one human and AGI which understands a hypothetical aggregate of human values. Therefore I use FAI to refer to both concepts.

Conceptually there is very little difference between AGI which understands the values of one human and AGI which understands a hypothetical aggregate of human values.

The big difference is in how the AI builds that aggregate from individual values. There are surely many ways to do so, and people will disagree about them (and some ways will not be wanted by anyone at all but we could get them by mistake for a non-Friendly AI). On the other hand, existing suggestions like CEV are greatly underspecified and haven't even been strictly proven to have real (or unique) solutions.

That apparently conflicts with how it is defined in Creating Friendly AI.

I disagree. Why couldn't outlined procedure for creating friendly AI (3.4.4 of link) to be based on a one individual, a superposition of a small group of individuals instead of a superposition of all of humanity?

The preference of particular individual humans may involve harming other humans.

Defining such preferences as "friendly" violates the concept as it was originallly intended, IMO.

What? What are your definitions of FAI and unfriendly AI?

Artificial general intelligence (AGI) = any intelligence with near-human capabilities of learning and innovation

Friendly AI (FAI)= an AGI which understands human values

unfriendly AI = an AGI which is not FAI

FAI will be harder to develop than unfriendly AI, but the question is how much harder?

Reasons why FAI is harder than AGI have been extensively discussed by Yudkowski and others. These include:

  • Human values are hard to understand

  • The growth mechanism of FAI must have low variability to ensure that the FAI remains friendly. But high-variability growth mechanisms (such as mutation and selection) could be the easiest path to AGI.

FAI will be harder to develop than unfriendly AI

Doesn't this conflict with your other statement?

the easier it is to develop FAI in comparison to unfriendly AI

I see your point. I have modified my comment to

the less difficult it is to develop FAI in comparison to unfriendly AI

One concern for selfish AGI researchers is how they could control how AGI is used after it is invented. Obviously, control of AGI is essential (but by no means a guarantee) for ensuring that the use of AGI technology gives a net benefit to an AGI researcher rather than potentially catastrophic negatives. It seems most plausible that it will require a team effort to develop AGI. In that case, the three most plausible outcomes seem to be: i) the larger organization supporting the team immediately gains control of the AGI technology, ii) the AGI technology spreads worldwide, iii) the team is able to maintain exclusive control of the AGI for a short period of time. In the case of the first outcome, the individual AGI researchers are unlikely to be able to exert much influence on how AGI will be used. The second outcome gives the AGI researchers even less control than the first outcome. The third outcome could be extremely rewarding for the researchers, but seems implausible unless the team can agree on some well-designed mechanism for enforcing coordination.

In comparison to unfriendly AI, FAI offers several possible advantages in terms of securing control: FAI may be more resistant to tampering; FAI could garner more public support, which could lead to political efforts to help secure FAI; FAI could grow quickly enough to defend itself and yet still not pose a threat to the AGI researchers (while a fast-growing unfriendly AI would be unacceptably dangerous).

Therefore, in regards to the control issue, FAI becomes more preferable to unfriendly AI for selfish AGI researchers:

1) the quicker that AGI can self-improve

2) the less difficult it is to develop FAI in comparison to unfriendly AI

3) the more difficult it is to control unfriendly AI

4) the more difficult it is for a team of researchers to ensure coordination

5) the more difficult it is to secure AGI technology from being stolen

6) the more difficult it is for the AGI development team to influence a larger organization which takes control of AGI technology

7) the more difficult it is to modify FAI for selfish aims

8) the easier it is to pass laws serving to protect FAI

9) the less the AGI researchers can benefit from maintaining exclusive control of AGI