Questions for a Friendly AI FAQ

by lukeprog1 min read19th Sep 201129 comments


Personal Blog

I've begun work (with a few others) on a somewhat comprehensive Friendly AI F.A.Q. The answers will be much longer and more detailed than in the Singularity FAQ. I'd appreciate feedback on which questions should be added.

1.  Friendly AI: History and Concepts

    1.  What is Friendly AI?

    2.  What is the Singularity? [w/ explanation of all three types]

    3.  What is the history of the Friendly AI Concept?

    4.  What is nanotechnology?

    5.  What is biological cognitive enhancement?

    6.  What are brain-computer interfaces?

    7.  What is whole brain emulation?

    8.  What is general intelligence? [w/ explanation of why 'optimization power' may less confusing than 'intelligence', which tempts anthropomorphic bias]

    9.  What is greater-than-human intelligence?

  10.  What is superintelligence, and what powers might it have?

2.  The Need for Friendly AI

    1.  What are the paths to an intelligence explosion?

    2.  When might an intelligence explosion occur?

    3.  What are AI takeoff scenarios?

    4.  What are the likely consequences of an intelligence explosion? [survey of possible effects, good and bad]

    5.  Can we just keep the machine superintelligence in a box, with no access to the internet?

    6.  Can we just create an Oracle AI that informs us but doesn't do anything?

    7.  Can we just program machines not to harm us?

    8.  Can we program a machine superintelligence to maximize human pleasure or desire satisfaction?

    9.  Can we teach a machine superintelligence a moral code with machine learning?

   10. Won’t some other sophisticated system constrain AGI behavior?

3.  Coherent Extrapolated Volition

    1.  What is Coherent Extrapolated Volition (CEV)?

    2.  ...

4.  Alternatives to CEV

    1.  ...

5.  Open Problems in Friendly AI Research

    1.  What is reflective decision theory?

    2.  What is timeless decision theory?

    3.  How can an AI preserve its utility function throughout ontological shifts?

    4.  How can an AI have preferences over the external world?

    5.  How can an AI choose an ideal prior given infinite computing power?

    6.  How can an AI deal with logical uncertainty?

    7.  How can we elicit a utility function from human behavior and function?

    8.  How can we develop microeconomic models for self-improving systems?

    9.  How can temporal, bounded agents approximate ideal Bayesianism?

Personal Blog


29 comments, sorted by Highlighting new comments since Today at 10:18 AM
New Comment

You may also wish to consider including some of the questions in my and Tom McCabe's Singularity FAQ. I was never very happy with it and don't generally endorse it in its current state. We managed to gather a large variety of AI/FAI/Singularity-related questions from various people, but never really managed to write very good answers. But you might find some of the questions useful.

Questions from some FAI-relevant sections. IIRC, these are all questions that somebody has actually asked, at one point or another.

Alternatives to Friendly AI

Q1). Couldn't AIs be built as pure advisors, so they wouldn't do anything themselves?
Q2). Wouldn't a human upload naturally be more Friendly than any AI?
Q3). Trying to create a theory which absolutely guarantees Friendly AI is an unrealistic, extremely difficult goal, so isn't it a better idea to attempt to create a theory of "probably Friendly AI"?
Q4). Shouldn't we work on building a transparent society, where no illicit AI development can be carried out?

Implementation of Friendly AI

Q1). Wouldn't an AI that's forced to be Friendly be prevented from evolving and growing?
Q2). Didn't Shane Legg prove that we can't predict the behavior of intelligences smarter than us?
Q3). Since a superintelligence could rewrite itself to remove human tampering, isn't Friendly AI impossible?
Q4). Why would a super-intelligent AI have any reason to care about humans, who would be stupid by comparison?
Q5). What if the AI misinterprets its goals?
Q6). Isn't it impossible to simulate a person's development without creating, essentially, a copy of that person?
Q7). Isn't it impossible to know a person's subjective desires and feelings from outside?
Q8). Couldn't a machine never understand human morality, or human emotions?
Q9). What if AIs take advantage of their power, and create a dictatorship of the machines?
Q10). If we don't build a self-preservation instinct into the AI, wouldn't it just find no reason to continue existing, and commit suicide?
Q11). What if superintelligent AIs reason that it's best for humanity to destroy itself?
Q12). The main defining characteristic of complex systems, such as minds, is that no mathematical verification of properties such as "Friendliness" is possible; hence, even if Friendliness is possible in theory, isn't it impossible to implement?
Q13). Any future AI would undergo natural selection, so wouldn't it eventually become hostile to humanity to better pursue reproductive fitness?
Q14). Shouldn't FAI be done as an open-source effort, so other people can see that the project isn't being hijacked to make some guy Supreme Emperor of the Universe?
Q15). If an FAI does what we would want if we were less selfish, won't it kill us all in the process of extracting resources to colonize space as quickly as possible to prevent astronomical waste?
Q16). What if ethics are subjective, not objective? Then, no truly Friendly AI could be built.
Q18). Isn't the idea of a hostile AI anthropomorphic?
Q19). Isn't the idea of "Friendliness", as we understand it now, too vaguely defined?
Q20). Why don't mainstream researchers consider Friendliness an issue?
Q21). How could an AI build a computer model of human morality, when human morals contradict each other, even within individuals?
Q22). Aren't most humans are rotten bastards? Basing an FAI morality off of human morality is a bad idea anyway.
Q23). If an AI is programmed to make us happy, the best way to make us happy would be to constantly stimulate our pleasure centers, so wouldn't it turn us into nothing but experiencers of constant orgasms?
Q24). What if an AI decides to force us to do what it thinks is best for us, or what will make us the happiest, even if we don't like it?

General Questions

Q1). If AI is a serious threat, then wouldn't the American government or some other official agency step in and take action, for fear of endangering national security?
Q2). Won't the US government, Google, or some other large organization with billions of dollars and thousands of employees, be the first ones to develop strong AI?
Q3). What if the Singularity Institute and its supporters is just another "doomsday cult", like the religious extremists who talk about how "the end is nigh"?
Q4). Shouldn't all of humanity have input on how a Friendly AI should be designed, instead of just a few programmers or scientists?
Q5). Has the Singularity Institute done research and published papers, like other research groups and academic institutions?

Societal issues

Q1). What if humans don't accept being ruled by machines?
Q2). How do we make sure that an AI doesn't just end up being a tool of whichever group built it, or controls it?
Q3). Aren't power-hungry organizations going to race to AI technology, and use it to dominate the world, before there's time to create truly Friendly AI?
Q4). What if an FAI only helps the rich, the First World, uploaded humans, or some other privileged class of elites?
Q5). Since hundreds of thousands of people are dying every day, don't need AI too urgently to let our research efforts be delayed by having to guarantee Friendliness?

What are the main objections to the likelihood of the Singularity occurring? What actions might people take to stop a Singularity from occurring? How will competition among businesses and governments impact the amount of care taken with respect to friendly AI? What's the difference between AI and AGI? What organizations are working on the friendly AI problem? How will expectations of friendly/unfriendly AI impact the amount of resources devoted to AI, i.e. if financial markets expect utopia to arrive in the next decade savings rates will fall which will lower tech research. What are AGI researchers estimates for when/if AGI will happen and the likelihood of it being "good" if it does occur. Why do most computer engineers dismiss the possibility of AGI? What is the track records of AGI predictions? Who has donated significant sums to friendly AI research? How much money is being spent on AGI/friendly AGI research?

As an academicish person, I suggest a few questions that bothered me at first: Why aren't more artificial intelligence research groups at universities working on FAI? Why doesn't the Singularity Institute publish all of its literature reviews and other work?

"Are there refereed publications on FAI in mainstream academic Ai research venues? Why not?"

Singularity/AI is a reasonable (if you agree with a number of assumption and extrapolations SI/EY are fond of) but an ultimately untestable concept (until it is too late, anyway), so govt funding would be hard to come by. Add to this the expected time frame of at least a few decades, and good luck getting a grant application approved for this research.

Follow-up/variation on Q5.7: Is it possible for unenhanced human brains to figure out how to properly formulate a human utility function? If not, could a WBE be reliably improved enough that it could do such a thing without significantly changing its values?

Also: Most AI researchers don't seem too concerned about friendliness. If I don't know even close to as much about AI as they do, why should I be convinced by any argument that I know failed to convince them?


Person 1: I'm doing a thing.

Person 2: Hmm. Have you considered issue? It seems like issue might be a problem with thing.

Person 1: It's not a problem.

Person 2: Why do you think that?

Person 1: Because I have no idea how to deal with issue. It looks impossible to solve.

Person 2: Oh, OK, you don't think issue is a problem because you have no idea how to solve issue, that makes sense...wait, what!?

You shouldn't be too convinced until you heard from them why they rejected it.

If their argument that it is unlikely is technical, you may not be able to understand or judge it.

If their argument that it is unlikely repeatedly emphasizes that there is no theory of Friendly AI as one of its main points, one should consider whether the AI expert is refusing to seriously consider the problem because he or she emotionally can't bear the absence of an easy solution.

Problems don't logically have to have solutions, resolutions that are pleasing to you. If you get stricken by multiple horrible diseases, there is no solution to the problem that's afflicting you. You die. That doesn't violate the laws of the universe, as unfair as it is. Comments like this are not rare:

I’m also quite unconvinced that “provably safe” AGI is even feasible

It's amazing that not only does someone find the argument from lack of a ready palatable solution a good reason to ignore the issue, that argument is actually being used to justify ignoring it in communications with other intelligent people. This is exactly analogous to the politician arguing to continue the Vietnam War because of the costs sunk into it, rather than personally, in private deciding to continue a failed policy for political reasons and then lying about his reasons publicly.

That argument is palpably unreasonable, it's a verbalization of the emotional impetus informing and potentially undermining thinking, a selective blindness that ends not with fooling one's self, but with failing to fool others due to one's inability to see that such an argument is not logically compelling and only appeals to those emotionally invested in conducting GAI research. The difficulty of solving the problem does not make it cease to be a problem.

There is some relevance in mentioning the plausibility of SIAI's general approach to solving it, but the emphasis I have seen on this point is out of all proportion with its legitimate role in the conversation. It appears to me as if it is being used as an excuse not to think about the problem, motivated by the problem's difficulty.


"Friendly AI theory" as construed by the SIAI community, IMO, is pretty likely an intellectual dead end.

There are many fundamental problems in alchemy that also remain unsolved. They weren't solved; the world moved on.

I'm pretty sure "FAI Theory" as discussed in the SIAI community is formulating the problem in the wrong way, using the wrong conceptual framework.


"When you're designing something where human lives are at stake, you need to determine the worst possible conditions, and then to design it in such a fashion that it won't catastrophically fail during them. In the case of AI, that's Friendly AI. In the case of a bridge, that's giving it enough reinforcement that it won't fall down when packed full of cars, and then some."

We have a nice theory of bridge-building, due to having theories about the strength of materials, Newtonian physics, earth science, etc. etc.

OTOH, there is no theory of "Friendly AI" and no currently promising theoretical path toward finding one. If you believe that SIAI has a top-secret, almost-finished rigorous theory of "Friendly AI" [and note that they are certainly NOT publicly claiming this, even though I have heard some of their stronger enthusiasts claim it], then, well, I have a bridge to sell you in Brooklyn ;-) ... A very well put together bridge!!!!

The importance of a problem is not proportional to the ease of solving it. You don't need any technical understanding to see through things like this. Although it is subjective, my firm opinion is that the amount of attention critics pay to emphasizing the difficulty they see with Eliezer's solution to the problem Eliezer has raised is out of proportion to what an unmotivated skeptic would spend.

The idea of provably safe AGI is typically presented as something that would exist within mathematical computation theory or some variant thereof. So that's one obvious limitation of the idea: mathematical computers don't exist in the real world, and real-world physical computers must be interpreted in terms of the laws of physics, and humans' best understanding of the "laws" of physics seems to radically change from time to time. So even if there were a design for provably safe real-world AGI, based on current physics, the relevance of the proof might go out the window when physics next gets revised.


Another issue is that the goal of "Friendliness to humans" or "safety" or whatever you want to call it, is rather nebulous and difficult to pin down. Science fiction has explored this theme extensively. So even if we could prove something about "smart AGI systems with a certain architecture that are guaranteed to achieve goal G," it might be infeasible to apply this to make AGI systems that are safe in the real-world -- simply because we don't know how to boil down the everyday intuitive notions of "safety" or "Friendliness" into a mathematically precise goal G like the proof refers to.

Eliezer has suggested a speculative way of getting human values into AGI systems called Coherent Extrapolated Volition, but I think this is a very science-fictional and incredibly infeasible idea (though a great SF notion).

But setting those worries aside, is the computation-theoretic version of provably safe AI even possible? Could one design an AGI system and prove in advance that, given certain reasonable assumptions about physics and its environment, it would never veer too far from its initial goal (e.g. a formalized version of the goal of treating humans safely, or whatever)?

I very much doubt one can do so, except via designing a fictitious AGI that can't really be implemented because it uses infeasibly much computational resources.


I suppose that the "build a provably Friendly AI" approach falls in line with the "AI Nanny" idea. However, given the extreme difficulty and likely impossibility of making "provably Friendly AI", it's hard for me to see working on this as a rational way of mitigating existential risk.


Further, it's possible that any system achieving high intelligence with finite resources, in our physical universe, will tend to manifest certain sorts of goal systems rather than others. There could be a kind of "universal morality" implicit in physics, of wich human morality is one manifestation. In this case, the AGIs we create are drawn from a special distribution (implied by their human origin), which itself is drawn from a special distribution (implied by physics).

Universal instrumental values do not militate towards believing friendliness is not an important issue...the contrary. Systems whose behaviors imply utility functions want to put resources towards their implicit goals, whatever they are, unless they are specific perverse goals such as not expending resources.

Most AI researchers don't seem too concerned about friendliness. If I don't know even close to as much about AI as they do, why should I be convinced by any argument that I know failed to convince them?

First, they might not be very unconvinced by the arguments. Video text

Hugo de Garis was one of the speakers at the conference, and he polled the audience, asking: “If it were determined that the development of an artificial general intelligence would have a high likelihood of causing the extinction of the human race, how many of you feel that we should still proceed full speed ahead?” I looked around, expecting no one to raise their hand, and was shocked that half of the audience raised their hands. This says to me that we need a much greater awareness of morality among AI researchers.

Second, abstractly: it is much easier to see how things fail than how they succeed.

The argument that Friendliness is an important concern is an argument that GAIs systematically fail in certain ways.

For each GAI proposal, taboo "Friendly". Think about what the Friendliness argument implies, and where it predicts the GAI would fail. Consider the designer's response to the specific concern rather than to the whole Friendliness argument. If their response is that a patch would work, one can challenge that assertion as well if one understands a reason why the patch would fail. One doesn't have to pit his or her (absent) technical understanding of Friendliness against a critic's.

Ultimately my somewhat high belief that no present or foreseeable GAI design that ignores Frienliness would be safe for humanity is mostly a function of a few things: my trust in Omohundro/Eliezer plus my non-technical understanding plus my knowledge about several GAI designs that supposedly avoid the problem and I know don't plus having heard bad arguments accepted as a refutation of Friendliness generally. It's not based solely on trusting authority.

More questions to perhaps add:

What is self-modification? (In particular, does having one AI build another bigger and more wonderful AI while leaving "itself" intact count as self-modification? The naive answer is "no", but I gather the informed answer is "yes", so you'll want to clarify this before using the term.)

What is wrong with the simplest decision theory? (That is, enumerate the possible actions and pick the one for which the expected utility of the outcome is best. I'm not sure what the standard name for that is.) It's important to answer this so at some point you state the problem that timeless decision theory etc. are meant to solve.

I gather one of the problems with the simplest decision theory is that it gives the AI an incentive to self-modify under certain circumstances, and there's a perceived need for the AI to avoid routine self-modification. The FAQ question might be "How can we avoid giving the AI an incentive to self-modify?" and perhaps "What are the risks of allowing the AI to self-modify?"

What problem is solved by extrapolation? (This goes in the CEV section.)

What are the advantages and disadvantages of having a bounded utility function?

Can we just upload a moral person? (In the "Need for FAI" section. IMO the answer is a clear "no".)

I suggest rephrasing "What powers might it have?" in 1.10 to "What could we reasonably expect it to be able to do?". The common phrase "magical powers" gives the word "powers" undesired connotations in this context, makes us sound like loonies.

Who is the target audience?

Meta-question: what is the smallest subset of these questions that someone interested in the issue should read first?

(IOW: organizing these questions by category makes sense from an author's point of view, but different organization keys might make a lot more sense from a reader's point of view.)

what is the smallest subset of these questions that someone interested in the issue should read first?

The Singularity FAQ. That will be announced at the top of the document.

What is recursive self-improvement?

Why should we believe any prediction that an AGI is likely to be created soon, given the history of these predictions in the past?

What progress has been made at solving the very hard problems of AGI, such as representing general knowledge, or understanding natural language?

Is there it possible that humans are incapable of constructing an AGI by reason of our own limited intelligence?

Is it possible for an AGI to be created and yet an intelligence explosion to not happen? [Norvig's talk at the Singularity Summit posits that this is possible]

(Note that I don't fully endorse the skepticism of these questions, but they're questions that reasonable people might ask).

What security measures are needed for the development of a Friendly AI? (i.e. if leaked source code got into the hands of a less scrupulous team, would it increase the risk of an unfriendly intelligence explosion?)

How much would it cost to develop a Friendly AI?

If a Friendly AI were developed, how would it affect my life?

How likely is it that systems intended as domain-specific would accidentally cross the threshold into AGI?

I'll include this under the 'paths' question.

I hate to comment before reading the body of your post, but the title of the post quite literally says "Friendly Artificial Intelligence Frequently Asked Questions Questions."

I'm just pointing it out to get it out of the way, though... It doesn't really bug me that much.

It is commonplace to use the term "FAQ" to mean "list of frequently asked questions", in which "FAQ questions" means "questions for putting in a list of frequently asked questions". Which would be rather cumbersome and repetitious-sounding ... if it weren't for that convenient abbreviation "FAQ".

Alright. That makes sense.

Sort of like changing your name through common usage.

I used to be more annoyed at that sort of thing, until I caught myself saying, "CSS Stylesheet" to distinguish it from other uses (inline, etc.) of using CSS.

"FAQ"doesn't really mean "Frequently Asked Questions" even though it's technically an acronym for that phrase. The actual meaning is something like "list of questions anticipated to be common among visitors, complete with answers".

Questions for Section 3: What if CEV outputs something horrible? What if it misses out on something awesome? Why would values converge at all?

The non-convergence of a large set of utility functions when using the Solomonoff prior is an important open problem (or are you including that in Q5.5?).

[-][anonymous]9y 1

"How can an AI choose an ideal prior given infinite computing power?" is just begging for a follow-up question like "What if you don't have infinite computing power?"

EDIT: Oh, I didn't notice Q5.9 there which I guess is that.

[-][anonymous]9y 1

Consider addressing these questions:

  • "Can we just upload humans instead of trying to create de novo AI?"

  • "Solving Friendly AI now seems too difficult. Can we just build an AI Nanny, which would temporarily hold off a Singularity, for now?"

  • and some other passageways mentioned here;

Also I would like this question to be answered in detail:

  • What is reflective decision theory, and why do we need it?
  • What are the applications of reflective decision theory in FAI/How will the FAI use reflective decision theory?

  • "What is updateless decision theory?"

[-][anonymous]9y 0

"Can we just build an AI Nanny for now?"

[This comment is no longer endorsed by its author]Reply