Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Financial status: This is independent research, now supported by a grant.

Epistemic status: Reflections on a debate about AI.


Earlier this year, The Munk Debates podcast organized a debate between Stuart Russell and Melanie Mitchell on the question of whether AI poses an existential risk. The motion was:

Be it resolved, the quest for true AI is one of the great existential risks of our time.

Stuart was for, Melanie was against.

A significant portion of the debate was about the nature of the term "intelligence". Melanie contended that a truly intelligent machine would understand what we really mean when we give it incomplete instructions, or else not deserve the mantle of "truly intelligent". She said that if we built a general-purpose AI and asked it to do some simple task, and the machine burned up all the energy in the universe in service of that simple task, then the AI would have to lack some common sense that even humans have, so it would be an error to say that such an AI had "superhuman intelligence".

Stuart responded by noting that powerful AI systems could plausibly pursue goals that are either beneficial or harmful to humans (the orthogonality thesis), and that it seems very difficult to specify what we mean by "good" in a single pre-specified objective function (the argument from fragility of human values).

It seemed to me that Stuart and Melanie were using the word "intelligent" to refer to different concepts. Start, I believe, was using the word to refer to systems that have the capacity to flexibly influence the future in service of a wide variety of goals, while Melanie, I think, was using the word to refer to systems that have a kind of friendly aliveness that I think is close to what we mean by "friendly AI" or "aligned superintelligence".

I am personally most used to using the word "intelligence" as Stuart used it, but I think Melanie was also pointing to a real phenomenon in the world, and perhaps even one that accords more closely with common English-language use of the word. For example, I might not intuitively think of "intelligent" as appropriate to describe a corporation that is highly effective at achieving a myopic and rigidly held goal, even if I did view that corporation as very powerful. In contrast, there is a certain sense of the word "intelligent" that is highly appropriate to describe the natural resonance that a dog or horse forms with me, even though such an animal does not steer the future very much. This is not the way that I most often use the word "intelligent" but it does, I think, point to something real.

In the same way, a paperclip maximizer might possess overwhelming power to convert matter and energy into paperclips, yet it would not be a completely unreasonable use of the English word "intelligent" to say that a paperclip maximizer is not "intelligent". This is a purely terminological point concerning only the use of the word "intelligent" in contemporary English.

If we taboo the word "intelligent" then there are two real-world phenomena that were being pointed at in the debate, one by Stuart, and one by Melanie. Stuart was, I think, pointing at the phenomenon of machines that exert flexible influence over the future. Melanie was, I think, pointing at the phenomenon of being friendly or benevolent or helpful. Both are extremely important concepts, and it seems to me that the whole problem of AI alignment is about bringing that which is powerful together with that which is friendly or benevolent or helpful.

I don’t think Stuart and Melanie were hitting on a real disagreement about the nature of powerful autonomous systems or whether they might present dangers to life on Earth, but rather were using the word "intelligent" to refer to different things. I think it would have been a more satisfying debate if Stuart and Melanie had noticed this terminological difference in their language and then debated the actual underlying issue, which is, to my mind, whether there are existential risks posed by the development of systems that possess intelligence in the way that Stuart was using the term but do not possess intelligence in the way that Melanie was using the term.

Two questions arising from the debate

The most intriguing question to me, and the reason that I’m writing this post at all, is that the debate seemed to illuminate two really deep questions:

  1. What the heck does intelligence in the sense that Stuart used the term really consist of?

  2. What the heck does intelligence in the sense that Melanie used the term really consist of?

We actually do not have a formal understanding of how to construct physical entities that possess intelligence in the sense that Stuart used the term. We see that the phenomenon clearly exists in the world, and it even seems that we might soon construct entities with this kind of intelligence by searching over huge spaces of possible algorithms, yet we do not understand the nature of this phenomenon.

Yet even more incredible to me is that I am helped by people and systems in the world every day, and I sometimes help others out, and I form some kind of understanding of how to be helpful in different circumstances. I gradually learn what is a helpful way to be present with a friend who is going through emotional turmoil, or what is a helpful way to address a group of people about to embark on a project together, and there are these breakthrough discoveries in the domain of how to be helpful that seem applicable across many circumstances, yet seem very difficult to convey. We do not have a formal understanding of this phenomenon, either, yet it seems to me that we absolutely must discover one in order to build powerful autonomous systems that are beneficial to all life on Earth.

29

Ω 14

3 comments, sorted by Click to highlight new comments since: Today at 6:19 AM
New Comment

Melanie contended that a truly intelligent machine would understand what we really mean when we give it incomplete instructions, or else not deserve the mantle of "truly intelligent".

This sounds pretty reasonable in itself: a generally capable AI has a good change of being able to distinguish between what we say and what we mean, within the AI's post-training instructions.  But I get the impression that she then implicitly takes it a step further, thinking that the AI would necessarily also reflect on its core programming/trained model, to check for and patch up similar differences there.  An AI could possibly work that way, but it's not at all guaranteed--just like how a person may discover that they want something different from what their parents wanted them to want, and yet stick with their own desire rather than conforming to their parents' wishes.

Melanie, I think, was using the word to refer to systems that have a kind of friendly aliveness that I think is close to what we mean by "friendly AI" or "aligned superintelligence".

Less charitably, "intelligence" is an applause light, and of course the true intelligence must be associated with all that is nice and good.

What the heck does intelligence in the sense that Melanie used the term really consist of?

Not having listened to the debate, I unfoundedly imagine intelligence might mean for her something related to intelligibility. The idea being: Something is intelligent if it "intelliges" whatever is intelligible; if it does what we do when we play our part in the intelligibility of something. (Both "intelligence" and "logic" come from a root meaning "to gather; to speak", also "to lay out, arrange".) Whatever we might want to communicate by speaking is intelligible, so if something is intelligent, it should be able to relate to what we lay out (λέγειν) the way we do. If it relates to what we say the way we do, it's friendly, though not necessarily subservient.