Characterizing the superintelligence which we are concerned about

What is this “superintelligence” we are concerned about? In writing articles on FAI topics, I took the easy way out and defined the focus of attention as an AI that can far outdo humans in all areas. But this just a useful shortcut, not what we are really talking about.

In this essay, I will try to better rcharacterize the topic of interest.

Some possibilities that have been brought up include intelligences

  • which are human-like,
  • which are conscious,
  • which can outperform humans in some or all areas,
  • which can self-improve,
  • or which meet a semi-formal or formal definition of intelligence or of above-human intelligence.

All these are important features in possible future AIs which we should be thinking about.But what really counts is whether an AI can outwit us when its goals are pitted against ours.

1. Human-like intelligence. We are humans, we care about human welfare; and humans are the primary intelligence which cooperates and competes with us; so human intelligence is our primary model.  Machines that “think like humans” are an intuitive focus on discussions of AI; Turing took this as the basis for his practical test for intelligence

Future AIs might have exactly this type of intelligence, particularly if they are emulated brains, what Robin Hanson calls “ems.”

If human-like AI is the only AI to come, then not much will have happened: We already have seven billion humans, and a few more will simply extend economic trends. If, as Hanson describes, the ems need fewer resources than humans, then we can expect extreme economic impact. If such AI has certain differences from us humans, like the ability to self-improve, then it will fall under the other categories, as described below.

J. Storrs Hall (Beyond AI: Creating the conscience of the machine, 2007) explores different types of intelligence in relation to the human model: “hypohuman”, “epihuman,” “allohuman,” “parahuman,” and “hyperhuman.” Those distinctions are important, but in this essay I am focusing specifically on the specific level of intelligence which can be expected to end the human era, rather than on the relation of such intelligence to human intelligence.

2. More narrowly than full human emulation, we can consider machines that are conscious, feel emotions, or have some other human mental characteristic. The first half of Chalmers’ seminal 2010 article brought some essential FAI ideas into the academic sphere; but the second half was devoted to the question of AI consciousness, as if this were the most important thing about AI.

Future AIs may eventually have consciousness or not, whatever consciousness is. But in any case, a machine which is is about to create a cure for aging, or turn us the solar system into computronium is very relevant to our future, whether or not it is conscious; and likewise for emotions and other human mental features.

3. Performing better than humans in some but not all areas. We already have that, in arithmetic, information retrieval, chess, and other areas, and though this has benefited humanity, this is not what we are trying to describe here.

We might imagine an roughly human-level intelligence which is better than humans in a broad range of abilities, and worse in a variety of other areas, an “alien intelligence.” Just as humans today can harm and help us, such an intelligence could potentially be significant to our futures. But what we are interested here is not just an intelligence that is good in a variety of goals, if this has no effect on us.

4. Outperforming humans in all areas: a machine that can surpass--or far surpass, in IJ Good’s original formulation--”all the intellectual activities of any man however clever.” This is convenient as a sufficient condition in considering superintelligences which can radically benefit or harm the human future. (E.g., this see this article by Bostrom)

But it is not a necessary condition. Imagine a machine that can far surpass any human in all areas except for cooking fine cuisine: It can still excel in creating cancer cures, conquering the earth, or turning us all into paper clips. This would qualify as the entity that can rework our futures for better or worse, even though it does not “surpass all the intellectual activities of any man.”

5. Self-improving. Even an infrahuman AI, if it were capable of self-improving  and motivated to do so (for reasons well-discussed by Omohundro and others), might  improve itself to human levels and ultimately to superintelligence. Yudkowsky, in his early article “Staring into the Singularity,” devotes the abstract and the article’s key point, to AI self-improvement.

But what really interests us in self-improvement is actually the resulting superintelligence. If the AI were to self-improve but stop at an infrahuman level, that would not be the AI which we are considering. On the other hand, if human engineers construct an AI which from the first has a greatly superhuman level of intelligence, but which is utterly incapable of self-improving, that would be one of those entities we are considering here, one capable of great benefit or harm.

6. Optimization power. In trying to broaden the idea of intelligence beyond the human model, terms such as “flexible,” or “general” intelligence are used. Legg and Hutter captured the idea with a definition summarizing the work of about 70 authors: “Intelligence measures an agent’s ability to achieve goals in a wide range of environments.” To distinguish this from the human model, we sometimes speak of “optimization power” instead of “intelligence.”

Goertzel suggests that we adds to this definition a consideration of efficiency: An agent that can work quickly with limited resources is smarter than one that can do the same thing slowly with more resources. Goertzel also asks that we consider an agent’s ability in the real-world environment which we humans have to deal with, and not across all possible environments, the great majority of which are completely irrelevant to anything the agent will be faced with.

However, these definitions are too loose to help us identify the entities that we are most interested in, if we care about radical changes to the human future?

7. Legg and Hutter further formalize the definition mentioned above with a formal metric, the Universal Intelligence Measure. We could use this metric to precisely specify entities which are smarter than the humans.  If the average human (or the smartest one, or the collective of all humans) has an intelligence measure of, say, 3.1, you could say that we are interested any AI with intelligence 3.1 or higher.

However, the Universal Intelligence Measure is incomputable, and Goertzel’s modifications add even more complications. Until useful approximations are found, these formal measures are not applicable to our problem--and even then, we will only know which agents are smarter than humans according to a certain metric, which is not necessarily what we are looking for here.

8. Able and motivated to outwit humans. This is what we really care about. If the agent is able to achieve its goals against our will, to out-trick us, to outthink us when we are trying to stop it, we could be in big trouble, if its goals are not exactly compatible with ours.

Some entities today can achieve their goals at the expense of humans. If you get on an automated monorail in the airport, and then realize that you on the train to Terminal 2, when you need to be in Terminal 1, the train will accomplish its goal (i.e., what it is designed for), and it will get you to Terminal 2, even though you don’t want it to. Biological viruses may well achieve their “goals” of reproduction at the expense of humans, and if it turns out that a pandemic is the greatest threat to the human species, we should fight to prevent that; maybe even redirecting resources from Friendly AI research.

But these entities can’t be said to “outwit” us: They are not achieving complex goals in complex environments, as in the definitions of intelligence mentioned above.

Though fully general intelligence is useful for outwitting humans, it is not necessary. If a machine has the goal of maximizing the number of paperclips in the world, and uses trickiness and guile to overcome our best efforts to stop it, ultimately converting all the matter in the solar system to paper clips, but has no ability to understand how to prepare cheesecake or why Benny Hill is funny, then it has still outwitted us to our detriment.

There is another type of agent that we are interested in, a Friendly one which does only what we want, one which can cure aging, hunger, and every other affliction.

Such a helpful AI also need not be fully general. Already today, computers help scientists cure diseases and do other good things. If we develop better and better specialized AI, we could go a long way even without artificial general intelligence.

But a Friendly artificial general intelligence could be even more helpful than a narrow one, by finding creative new ideas for giving us what we want. This is the Friendly AI which SI would like to build.

An AI which harms us and one which benefits us need not differ in their architecture, just in goals. A full, general intelligence could be very effective at outwitting us and also at helping us, depending on what it is trying to do.

All the concepts in 1-7 above are relevant to the question we are asking. Human intelligence is our moral concern, and our main existing model of intelligence, and consciousness is one of its central features. Superintelligence, per IJ Good’s definition, is a sufficient condition (together with a non-Friendly goal system) for destroying humanity. Self-improvement is the most likely way to reach that point. Definitions and formal metrics help pin down more precisely what we mean by intelligence.

But the AI to be concerned about is one which can outwit us in achieving its goals, and secondarily, one which is as good as possible in helping us achieve our goals.


13 comments, sorted by
magical algorithm
Highlighting new comments since Today at 12:53 AM
Select new highlight date

Until useful approximations are found, these formal measures are not applicable to our problem--and even then, we will only know which agents are smarter than humans according to a certain metric, which is not necessarily what we are looking for here.


To be honest, it looks to me like the only AI effort so far with the scary idea realization potential is, ironically, the FAI effort.

The other efforts deal with goals that are defined inside the computer, not outside it. E.g. the AI that designs airplanes, the designing airplanes is how you present it to the incompetents; the AI designs best wings for the fluid simulator it has inside. The fluid simulator designer, too, designs fluid simulators that mathematically correspond to the most accurate approximation to basic laws of physics (that are too expensive to run). Again, the goal is inside the formal system that is the AI. And so on. It is already the case that all effort but the FAI effort, is fairly safe, and has certain inherent failsafes (e.g. if you give too hard problem to theorem prover, if its so general it determines it needs new hardware, well, it also could determine that it needs new axioms, and in fact our design for solving maximization or search problem will 'wirehead' if you permit any kind of self modification affecting the goal or evaluator). The problem of making the AI care about our ill defined 'we know if its genuine effort or not when we see it' goals, rather than of its own internal representations, is likely a very very hard problem. A problem we don't need solved to e.g. resolve the hunger and disease using the AI. Nobody has ever been able to define an unfriendly AI (real paperclip maximizing one for example) even in a toy model given infinite computing power.

The FAI folk, for lack of domain specific knowledge, don't understand what the other AI effort looks like, misunderstand high level descriptions (AI that drives a car) for the model of what is being implemented, and project the dangers specific to their own approach, onto everything else. They are anthropomorphizing the AI a lot, while trying not to by ignoring whatever is the current hot thing that we evolved (today the grand topic in evolutionary talk is how we evolved altruism, so the AI won't have that and will be selfish and nasty; but note that not so long ago the hot topic was how we evolved to be selfish and nasty, so the AI optimists would talk of how the AI would all be nice and sweet and anyone who disagrees is anthropomorphizing).

edit: And to be fair, there's also the messy AI effort in form of emulation of human brain, neural networks, and the like. Those hypothetical entities learn our culture, learn the values, and are a part of mankind in the way in which neat AIs are not; it's ridiculous to describe them as existential risk on par with runaway paperclip maximizer, and the question whenever they 'are' an existential risk is not a question of what the AIs would do but purely a matter of how narrowly you define our existence. You go too narrow, and we quit existing in 10 years as everyone changes from what they are; you define a little more broadly, and those learning AIs are the continued existence of mankind.