Apr 22, 2012
Part of the Muehlhauser interview series on AGI.
[Apr. 7, 2012]
Pei, I'm glad you agreed to discuss artificial general intelligence (AGI) with me. I hope our dialogue will be informative to many readers, and to us!
On what do we agree? Ben Goertzel and I agreed on the statements below (well, I cleaned up the wording a bit for our conversation):
You stated in private communication that you agree with these statements, depending on what is meant by "AGI." So, I'll ask: What do you mean by "AGI"?
I'd also be curious to learn what you think about AGI safety. If you agree that AGI is an existential risk that will arrive this century, and if you value humanity, one might expect you to think it's very important that we accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI. (This is what Anna Salamon and I recommend in Intelligence Explosion: Evidence and Import.) What are your thoughts on the matter?
[Apr. 8, 2012]
By “AGI” I mean computer systems that follow roughly the same principles as the human mind. Concretely, to me “intelligence” is the ability to adapt to the environment under insufficient knowledge and resources, or to follow the “Laws of Thought” that realize a relative rationality that allows the system to apply its available knowledge and resources as much as possible. See [1, 2] for detailed descriptions and comparisons to other definitions of intelligence.
Such a computer system will share many properties with the human mind; however, it will not have exactly the same behaviors or problem-solving capabilities of a typical human being, since as an adaptive system, the behaviors and capabilities of an AGI not only depend on its built-in principles and mechanisms, but also its body, initial motivation, and individual experience, which are not necessarily human-like.
Like all major breakthroughs in science and technology, the creation of AGI will be both a challenge and an opportunity to the human kind. Like scientists and engineers in all fields, we AGI researchers should use our best judgments to ensure that AGI results in good things rather than bad things for humanity.
Even so, the suggestion to “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” is wrong, for the following major reasons:
In summary, though the safety of AGI is indeed an important issue, currently we don’t know enough about the subject to make any sure conclusion. Higher safety can only be achieved by more research on all related topics, rather than by pursuing approaches that have no solid scientific foundation. I hope your Institute to make constructive contribution to the field by studying a wider range of AGI projects, rather than to generalize from a few, or to commit to a conclusion without considering counter arguments.
[Apr. 8, 2012]
I appreciate the clarity of your writing, Pei. “The Assumptions of Knowledge and Resources in Models of Rationality” belongs to a set of papers that make up half of my argument for why the only people allowed to do philosophy should be those with with primary training in cognitive science, computer science, or mathematics. (The other half of that argument is made by examining most of the philosophy papers written by those without primary training in cognitive science, computer science, or mathematics.)
You write that my recommendation to “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” is wrong for four reasons, which I will respond to in turn:
I agree. Friendly AI may be incoherent and impossible. In fact, it looks impossible right now. But that’s often how problems look right before we make a few key insights that make things clearer, and show us (e.g.) how we were asking a wrong question in the first place. The reason I advocate Friendly AI research (among other things) is because it may be the only way to secure a desirable future for humanity, (see “Complex Value Systems are Required to Realize Valuable Futures.”) even if it looks impossible. That is why Yudkowsky once proclaimed: “Shut Up and Do the Impossible!” When we don’t know how to make progress on a difficult problem, sometimes we need to hack away at the edges.
I certainly agree that “currently we don’t know enough about [AGI safety] to make any sure conclusion.” That is why more research is needed.
As for your suggestion that “Higher safety can only be achieved by more research on all related topics,” I wonder if you think that is true of all subjects, or only in AGI. For example, should mankind vigorously pursue research on how to make Ron Fouchier's alteration of the H5N1 bird flu virus even more dangerous and deadly to humans, because “higher safety can only be achieved by more research on all related topics”? (I’m not trying to broadly compare AGI capabilities research to supervirus research; I’m just trying to understand the nature of your rejection of my recommendation for mankind to decelerate AGI capabilities research and accelerate AGI safety research.)
Hopefully I have clarified my own positions and my reasons for them. I look forward to your reply!
[Apr. 10, 2012]
Luke: I’m glad to see the agreements, and will only comment on the disagreements.
For these reasons, under AIKR we cannot have AI with guaranteed safety or friendliness, though we can and should always do our best to make them safer, based on our best judgment (which can still be wrong, due to AIKR). To apply logic or probability theory into the design won’t change the big picture, because what we are after are empirical conclusions, not theorems within those theories. Only the latter can have proved correctness, and the former cannot (though they can have strong evidential support).
“I’m just trying to understand the nature of your rejection of my recommendation for mankind to decelerate AGI capabilities research and accelerate AGI safety research”
Frankly, I don’t think anyone currently has the evidence or argument to ask the others to decelerate their research for safety consideration, though it is perfectly fine to promote your own research direction and try to attract more people into it. However, unless you get a right idea about what AGI is and how it can be built, it is very unlikely for you to know how to make it safe.
[Apr. 10, 2012]
I didn’t mean to imply that my notion of AGI was “better” because it is broader. I was merely responding to your claim that my argument for differential technological development (in this case, decelerating AI capabilities research while accelerating AI safety research) depends on a narrow notion of AGI that you believe “will never be built.” But this isn’t true, because my notion of AGI is very broad and includes your notion of AGI as a special case. My notion of AGI includes both AIXI-like “intelligent” systems and also “intelligent” systems which obey AIKR, because both kinds of systems (if implemented/approximated successfully) could efficiently use resources to achieve goals, and that is the definition Anna and I stipulated for “intelligence.”
Let me back up. In our paper, Anna and I stipulate that for the purposes of our paper we use “intelligence” to mean an agent’s capacity to efficiently use resources (such as money or computing power) to optimize the world according to its preferences. You could call this “instrumental rationality” or “ability to achieve one’s goals” or something else if you prefer; I don’t wish to encourage a “merely verbal” dispute between us. We also specify that by “AI” (in our discussion, “AGI”) we mean “systems which match or exceed the intelligence [as we just defined it] of humans in virtually all domains of interest.” That is: by “AGI” we mean “systems which match or exceed the human capacity for efficiently using resources to achieve goals in virtually all domains of interest.” So I’m not sure I understood you correctly: Did you really mean to say that “kind of AGI will never be built”? If so, why do you think that? Is the human very close to a natural ceiling on an agent’s ability to achieve goals?
What we argue in “Intelligence Explosion: Evidence and Import,” then, is that a very broad range of AGIs pose a threat to humanity, and therefore we should be sure we have the safety part figured out as much as we can before we figure out how to build AGIs. But this is the opposite of what is happening now. Right now, almost all AGI-directed R&D resources are being devoted to AGI capabilities research rather than AGI safety research. This is the case even though there is AGI safety research that will plausibly be useful given almost any final AGI architecture, for example the problem of extracting coherent preferences from humans (so that we can figure out which rules / constraints / goals we might want to use to bound an AGI’s behavior).
But perhaps I have been driving the direction of our conversation too much. Don’t hesitate it to steer it towards topics you would prefer to address!
[Apr. 12, 2012]
I don’t expect to resolve all the related issues in such a dialogue. In the following, I’ll return to what I think as the major issues and summarize my position.
Such a short position statement may not convince you, but I hope you can consider it at least as a possibility. I guess the final consensus can only come from further research.
[Apr. 19, 2012]
I agree that an AGI will be adaptive in the sense that its instrumental goals will adapt as a function of its experience. But I do think advanced AGIs will have convergently instrumental reasons to preserve their final (or “terminal”) goals. As Bostrom explains in “The Superintelligent Will”:
An agent is more likely to act in the future to maximize the realization of its present final goals if it still has those goals in the future. This gives the agent a present instrumental reason to prevent alterations of its final goals.
I also agree that even if an AGI’s final goals are fixed, the AGI’s behavior will also depend on its knowledge and resources, and therefore we can’t exactly predict its behavior. But if a system has lots of knowledge and resources, and we know its final goals, then we can predict with some confidence that whatever it does next, it will be something aimed at achieving those final goals. And the more knowledge and resources it has, the more confident we can be that its actions will successfully aim at achieving its final goals. So if a superintelligent machine’s only final goal is to play through Super Mario Bros within 30 minutes, we can be pretty confident it will do so. The problem is that we don’t know how to tell a superintelligent machine to do things we want, so we’re going to get many unintended consequences for humanity (as argued in “The Singularity and Machine Ethics”).
You also said that you can’t see what safety work there is to be done without having intelligent systems (e.g. “baby AGIs”) to work with. I provided a list of open problems in AI safety here, and most of them don’t require that we know how to build an AGI first. For example, one reason we can’t tell an AGI to do what humans want is that we don’t know what humans want, and there is work to be done in philosophy and in preference acquisition in AI in order to get clearer about what humans want.
[Apr. 20, 2012]
I think we have made our different beliefs clear, so this dialogue has achieved its goal. It won’t be an efficient usage of our time to attempt to convince each other at this moment, and each side can analyze these beliefs in proper forms of publication at a future time.
Now we can let the readers consider these arguments and conclusions.