I'm Henry Lieberman, Research Scientist at the MIT Computer Science and AI Lab. I'm interested in long-term thinking about the future of humanity, technology and society, and have developed some new ideas, together with my colleague Christopher Fry, about how AI and other technologies can ensure a positive future for humanity. For the full story, see my Web site, https://www.whycantwe.org, where you'll find a 12-minute TED talk; other videos, and writing based on our book, "Why Can't We All Just Get Along?". 

For specific thoughts on the Alignment question, here's an abstract (maybe a future paper and/or talk at an appropriate venue):

AI Alignment Depends on Human Alignment

Henry Lieberman – MIT CSAIL

The problem of whether the goals and values of an artificially intelligent agent will align with human goals and values, can be reduced to this problem: Will the goals and values of different human agents ever align with each other?

Regardless of what you believe about the problem for humans, we're likely to get the same answer when we think about intelligent machines. We can only program AI "in our own image", so both the features and bugs of humanity will reappear in AI. Thus, whether AI turns out to be a good thing or a bad thing in the future, depends critically on this question: Will humans  cooperate with each other, or will they compete with one another?  

Right now, our society is schizophrenic -- some of our institutions are oriented towards cooperation (like science), others (like business and politics) seem to be primarily oriented towards competing. Many of the social problems caused by AI are a result of this schizophrenia. If AI becomes a tool of warring human factions, we're doomed. But it doesn't have to be like that. 

With all the evident conflict and disagreement in the world, some despair of the prospect of ever getting people to align their values, substantially, if not perfectly. Yet the technology itself will provide unprecedented opportunities to eliminate the barriers to widespread social cooperation.   A positive future for AI depends on changing our competitive mindset (and institutions) towards more cooperative alternatives. I will present some concrete proposals for doing so. Then, we'll get benevolent AI "for free". 

-2

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 6:16 AM

The problem of whether the goals and values of an artificially intelligent agent will align with human goals and values, can be reduced to this problem: Will the goals and values of different human agents ever align with each other?

I would be ecstatic if AI turned out to be perfectly aligned with any particular human, rather than aligned with no human at all. The problem of "oh no what if Deepmind disagrees with the chinese about which human values to put in the AI" is rather small compared with the problem of actually figuring out how to put any values at all in the AI.

If the context was instead the problem of sending a rocket to the moon, you would have assumed away the actual engineering of the rocket and would now be concerned with the human squabbling about the particular destination crater.

Yes, if all humans agreed on everything, there would still be significant technical problems to get an AI to align with all the humans. Most of the existing arguments for the difficulty of AI alignment would still hold even if all humans agreed. If you (Henry) think these existing arguments are wrong, could you say something about why you think that, i.e. offer counterarguments?

The problem of whether the goals and values of an artificially intelligent agent will align with human goals and values, can be reduced to this problem: Will the goals and values of different human agents ever align with each other?

We can only program AI “in our own image”, so both the features and bugs of humanity will reappear in AI.

Why do you think these statements are true?

Yes great question. Looking at programming in general, there seem to be many obvious counterexamples, where computers have certain capabilities ('features') that humans don't (e.g. doing millions of arithmetic operations extremely fast with zero clumsy errors) and likewise where they have certain problem ('bugs') that we don't (e.g. adversarial examples for image classifiers, which don't trip humans up at all but entire ruin the neural nets classification.)

The problem of whether the goals and values of an artificially intelligent agent will align with human goals and values, can be reduced to this problem: Will the goals and values of different human agents ever align with each other?

Aligning human agents is a subproblem, but if you align human agents you don't automatically align agents in a world where the most powerful agents aren't human agents.

Schizophrenia is the wrong metaphor here -- it's not the same disease as split personalities (i.e. dissociative identity disorder). I think it would be clearer and more accurate to rewrite that paragraph without it. I don't intend this as an attack or harsh criticism, it's just that I have decided to be a pedant about this point whenever I encounter it, as I think it would be good for the general public to develop a more accurate and realistic understanding of schizophrenia.

Good point. I addition to that, using human diseases as a metaphor for AI misalignment is misleading, because it kinda implies that the default option is health; we only need to find and eliminate the potential causes of imbalance, and the health will happen naturally. While the very problem with AI is that there is no such thing as a natural good outcome. A perfectly healthy paperclip maximizer is still a disaster for humanity.

Then, we’ll get benevolent AI “for free”.

I am heavily skeptical of this claim to the extent that I genuinely think I misunderstood you. I do not see why, for example, being good at recognizing pictures of cats should make me good at programming a machine to recognize pictures of cats, so I think the general form of the argument I understood is wrong.

Thanks for sharing your ideas. I'm a bit confused about your core claim and would love if you could could clarify (Or refer to the specific part of your writing that addresses these questions): I get the general gist of your claim, that AI alignment depends on whether humans can all have the same values, but I don't know how much 'the same' you mean. You say 'substantially' align, could you give some examples of how aligned you mean? For example, do you mean all humans sharing the same political ideology (libertarian/communist/ etc)? Do you mean that for all non-trivial ethical questions (When is abortion permissable? How much duty do you have to your family vs yourself? How many resources should we devote to making things better on earth vs exploring space?), that you would need to be able to ask any human on earth and say 99% would give you the same answer?

Likewise with the idea of humans needing to compete less and cooperate more. How much less and more? For example, competition between firms is a core part of capitalism, do you think we need to completely eliminate capitalism? Or do you only mean eliminating zero/negative sum competition like war?

New to LessWrong?