Holden Karnofsky's Singularity Institute Objection 3

Paul Crowley

The sheer length of GiveWell co-founder and co-executive director Holden Karnofsky's excellent critique of the Singularity Institute means that it's hard to keep track of the resulting discussion. I propose to break out each of his objections into a separate Discussion post so that each receives the attention it deserves.

Objection 3: SI's envisioned scenario is far more specific and conjunctive than it appears at first glance, and I believe this scenario to be highly unlikely.

SI's scenario concerns the development of artificial general intelligence (AGI): a computer that is vastly more intelligent than humans in every relevant way. But we already have many computers that are vastly more intelligent than humans in some relevant ways, and the domains in which specialized AIs outdo humans seem to be constantly and continuously expanding. I feel that the relevance of "Friendliness theory" depends heavily on the idea of a "discrete jump" that seems unlikely and whose likelihood does not seem to have been publicly argued for.

One possible scenario is that at some point, we develop powerful enough non-AGI tools (particularly specialized AIs) that we vastly improve our abilities to consider and prepare for the eventuality of AGI - to the point where any previous theory developed on the subject becomes useless. Or (to put this more generally) non-AGI tools simply change the world so much that it becomes essentially unrecognizable from the perspective of today - again rendering any previous "Friendliness theory" moot. As I said in Karnofsky/Tallinn 2011, some of SI's work "seems a bit like trying to design Facebook before the Internet was in use, or even before the computer existed."

Perhaps there will be a discrete jump to AGI, but it will be a sort of AGI that renders "Friendliness theory" moot for a different reason. For example, in the practice of software development, there often does not seem to be an operational distinction between "intelligent" and "Friendly." (For example, my impression is that the only method programmers had for evaluating Watson's "intelligence" was to see whether it was coming up with the same answers that a well-informed human would; the only way to evaluate Siri's "intelligence" was to evaluate its helpfulness to humans.) "Intelligent" often ends up getting defined as "prone to take actions that seem all-around 'good' to the programmer." So the concept of "Friendliness" may end up being naturally and subtly baked in to a successful AGI effort.

The bottom line is that we know very little about the course of future artificial intelligence. I believe that the probability that SI's concept of "Friendly" vs. "Unfriendly" goals ends up seeming essentially nonsensical, irrelevant and/or unimportant from the standpoint of the relevant future is over 90%.

For example, in the practice of software development, there often does not seem to be an operational distinction between "intelligent" and "Friendly." (For example, my impression is that the only method programmers had for evaluating Watson's "intelligence" was to see whether it was coming up with the same answers that a well-informed human would; the only way to evaluate Siri's "intelligence" was to evaluate its helpfulness to humans.) "Intelligent" often ends up getting defined as "prone to take actions that seem all-around 'good' to the programmer." So the concept of "Friendliness" may end up being naturally and subtly baked in to a successful AGI effort.

Well, yes this is the definition of Friendliness in the most tautological sense. Siri and Watson are both very domain-specific AIs, so evaluating their "intelligence" or "Friendliness" is relatively trivial - you just have to see if their outputs match the small subset of the programmer's utility function that corresponds to what the programmer designed them to do. With AGI, you have to get the Friendliness right across all dimensions of human value (see: Value is Fragile), precisely because it's as cross-domain as an AI can possibly be.

Siri and Watson are both very domain-specific AIs, so evaluating their "intelligence" or "Friendliness" is relatively trivial - you just have to see if their outputs match the small subset of the programmer's utility function that corresponds to what the programmer designed them to do.

Assume you were to gradually transform Google Maps into a seed AI, at what point would it become an existential risk and how? And why wouldn't you just skip that step?

More here.

Assume you were to gradually transform Google Maps into a seed AI, at what point would it become an existential risk and how?

If it tries to self-improve, and as a side effect turns the universe to computronium.

If it gains a general intelligence, and as a part of trying to provide better search results, it realizes that self-modification could bring much faster search results.

This whole idea of a harmless general intelligence is just imagining a general intelligence which is not general enough to be dangerous; which will be able to think generally, and yet somehow this ability will always reliably stop before thinking something that might end bad.

Assume you were to gradually transform Google Maps into a seed AI, at what point would it become an existential risk and how?

If it tries to self-improve, and as a side effect turns the universe to computronium.

Thanks, I completely missed that. Explains a lot.

Assume you were to gradually transform Google Maps into a seed AI, at what point would it become an existential risk and how? And why wouldn't you just skip that step?

A very important part of Google Maps is Street View, which is created by cars driving around and taking pictures of everything. These could be viewed as 'arms' of the seed AI, along with its surveillance satellites, WiFi sniffing for more accurate geolocation, 3D modelling of buildings, and the recently introduced building-interior maps.

Which is to say, Super Google Maps could be a gigantic surveillance network and pervasive examiner of every corner of reality so it could be as up to date as possible.

That reminds me of Project Pigeon, only with a weapon capable of destroying the planet, and we're the pigeon.

How does one do a gradual transformation on a discontinuous space such as the space of computer programs that are somehow related to navigation or general intelligence?

Assuming tool-AI is the only form available, it can still be used for nefarious purposes akin to nuclear weapons, and by much smaller groups of people, without any of the easy warning signs.

For example, in the practice of software development, there often does not seem to be an operational distinction between "intelligent" and "Friendly." (For example, my impression is that the only method programmers had for evaluating Watson's "intelligence" was to see whether it was coming up with the same answers that a well-informed human would; the only way to evaluate Siri's "intelligence" was to evaluate its helpfulness to humans.) "Intelligent" often ends up getting defined as "prone to take actions that seem all-around 'good' to the programmer." So the concept of "Friendliness" may end up being naturally and subtly baked in to a successful AGI effort.

Siri and Watson are both very domain-specific AIs, so evaluating their "intelligence" or "Friendliness" is relatively trivial - you just have to see if their outputs match the small subset of the programmer's utility function that corresponds to what the programmer designed them to do.

Assume you were to gradually transform Google Maps into a seed AI, at what point would it become an existential risk and how? And why wouldn't you just skip that step?

More here.

Assume you were to gradually transform Google Maps into a seed AI, at what point would it become an existential risk and how?

If it tries to self-improve, and as a side effect turns the universe to computronium.

If it gains a general intelligence, and as a part of trying to provide better search results, it realizes that self-modification could bring much faster search results.

Assume you were to gradually transform Google Maps into a seed AI, at what point would it become an existential risk and how?

If it tries to self-improve, and as a side effect turns the universe to computronium.

Thanks, I completely missed that. Explains a lot.

Assume you were to gradually transform Google Maps into a seed AI, at what point would it become an existential risk and how? And why wouldn't you just skip that step?

Which is to say, Super Google Maps could be a gigantic surveillance network and pervasive examiner of every corner of reality so it could be as up to date as possible.

That reminds me of Project Pigeon, only with a weapon capable of destroying the planet, and we're the pigeon.

How does one do a gradual transformation on a discontinuous space such as the space of computer programs that are somehow related to navigation or general intelligence?

Assuming tool-AI is the only form available, it can still be used for nefarious purposes akin to nuclear weapons, and by much smaller groups of people, without any of the easy warning signs.

LESSWRONG
LW

LESSWRONG
LW

8

Holden Karnofsky's Singularity Institute Objection 3

8

Objection 3: SI's envisioned scenario is far more specific and conjunctive than it appears at first glance, and I believe this scenario to be highly unlikely.

8

8