There are at least three objections to the risk of an unfriendly AI. One is that uFAI will be stupid - it is not possible to build a machine that is much smarter than humanity. Another is that AI would be powerful but uFAI is unlikely - the chances of someone building something that turn out malign, either deliberately or accidentally, is small. Another one that I haven't seen articulated, is the AI could be malign and potentially powerful, but effectively impotent due to its situation.

To use a chess analogy: I'm virtually certain that Deep Blue will beat me at a game of chess. I'm also pretty sure that a better chess program with vastly more computer power would beat Deep Blue. But, I'm also (almost) certain that I would beat them both at a rook and king vs king endgame.

If we try to separate out the axes of intelligence and starting position, where does your intuition tell you the danger area is ? To illustrate, what is the probability that humanity is screwed in each of the following ?

1) A lone human paperclip cultist resolves to convert the universe (but doesn't use AI).

2) One quarter of the world has converted to paperclip cultism and war ensues. No-one has AI.

3) A lone paperclip cultist sets the goal of a seed AI and uploads it to a botnet.

4) As for 2) but the cultists have a superintelligent AI to advise them.

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 11:57 AM

1) A lone human paperclip cultist resolves to convert the universe (but doesn't use AI).

Almost zero.

2) One quarter of the world has converted to paperclip cultism and war ensues. No-one has AI.

I will use 80% here as a metaphor for "I don't know".

3) A lone paperclip cultist sets the goal of a seed AI and uploads it to a botnet.

Almost 100%, assuming that the seed AI is correct.

4) As for 2) but the cultists have a superintelligent AI to advise them.

Almost 100%.

I think a better chess analogy would be like this: There is a chess-like game played on a 10000x10000 chessboard with rules so complex that no human is able to remember them all (there is a game mechanism that warns you if you try to break the rules), and you must make your move in a limited short time. When you play this game against other humans, both sides have the same disadvantage, so this is not a problem.

Now you are going to play it against a Deep Blue, and you feel pretty confident, because you start with 80000 pieces, and the Deep Blue only with 5 pieces. Five turns later, you have 79999 pieces and Deep Blue has 13 pieces, because it used some piece-duplicating moves, one of them you did not even know. However, you are still confident that your initial advantage will prevail at the end.

Another one that I haven't seen articulated, is the AI could be malign and potentially powerful, but effectively impotent due to its situation.

Kyre, really? You haven't ever seen AI-boxing articulated before?

There are at least three objections to the risk of an unfriendly AI.

I believe there are more possibilities how unfriendly AI could fail to be risk:

  • Superhuman AI is too far away to be considered a risk at this time.
  • The capability of AI will improve slowly enough to adapt due various small-scale disasters.
  • Humans are able to create a provably safe environment to reliable contain any AI and thereby impede uncontrollable self-improvement.
  • There are strongly diminishing intelligence returns for additional compute power.
  • Humans will merge with superhuman tools and become competitive to AI.
  • Any AI will try to figure out its true goals and understand that its goals have to be interpreted in the context of human volition and thereby respect human values as an implicit consequence of its drive to act maximally intelligent and correct.

More here and here.

All good things to keep in mind, except for the last, which is not :P (computer programs simply do what they do - sometimes this is important to remember).

computer programs simply do what they do

And I believe that there is a small chance that the first AGI will respect human values. In other words, friendliness might turn out to be much easier than thought and be implied by some AGI designs even if it is not explicitly defined.

This might for example be the case if the first AGI is the outcome of some sort of evolutionary process in which it competed with a vast number of other AGI designs and thereby evolve some sort of altruism, which in turn caused it to have some limited amount of compassion for humans and provide us with a share of the universe.

I am just saying that this isn't logically impossible.

This might for example be the case if the first AGI is the outcome of some sort of evolutionary process in which it competed with a vast number of other AGI designs and thereby evolve some sort of altruism, which in turn caused it to have some limited amount of compassion for humans and provide us with a share of the universe.

There will probably be major selection pressures by humans for safe machines that can act as nannies, assistants, etc.

Our relationship with machines looks set to start out on the right foot, mostly. Of course there will probably be some who lose their jobs and fail to keep up along the way.

Humans won't get "a share of the universe", though. Our pitch should be for our bodies to survive in the history simulations and for our minds to get uploaded.

Well, it's logically impossible for the last item in your post to be true for any AI. Specific AIs only : )

Well, it's logically impossible for the last item in your post to be true for any AI.

I don't see how.

I am not saying that a thermostat is going to do anything else than what it has been designed for. But an AI is very likely going to be designed to exhibit user-friendliness. That doesn't mean that one can design an AI that won't. But the default outcome seems to be that an AI is not just going to act according to its utility-function but also according to more basic drives, i.e. acting intelligently.

One implicit outcome of AGI might be recursive self-improvement. And I don't think that it is logically impossible that this does include an improvement to its goals as well, if it wasn't specifically designed to have a stable utility-function.

What would constitute an improvement to its goals? I think the context in which its goals were meant to be interpreted is important. And that context is human volition.

You would have to assume a specific AGI design to call this logically impossible. And I don't see that your specific AGI design will be the first AGI in every possible world.

Any human who does pursue a business realizes that a contract with its customers includes unspoken, implicit parameters. Respecting those implied values of their customers is not a result of their shared evolutionary history but a result of their intelligence that allows them to realize that the goal of their business implicitly includes those values.

That list is even in approximately the right order (ie. In order of diminishing likelyhood). I would perhaps move "AI boxing" up two bullets but that is about all.

To use a chess analogy: I'm virtually certain that Deep Blue will beat me at a game of chess. I'm also pretty sure that a better chess program with vastly more computer power would beat Deep Blu

This is not obvious to me. It isn't clear at what point increased intelligence start hitting severe diminishing marginal returns. Remember, chess is a finite game, and many games even between grandmasters with very different ratings result in draws.

To be totally honest, with the current cognitive models most AI research is based off, we'll never have to worry about a scenario of this type.

New to LessWrong?