Note for readers: the last substantial update of the content below dates back to 2014 and is severely outdated. The term of "friendly AI" is no longer used by current research, replaced by "AI alignment" from around 2015. This new term is also the subject of much debate.
A Friendly Artificial Intelligence (Friendly AI, or FAI) is a superintelligence (i.e., a really powerful optimization process) that produces good, beneficial outcomes rather than harmful ones. The term was coined by Eliezer Yudkowsky, so it is frequently associated with Yudkowsky's proposals for how an artificial general intelligence (AGI) of this sort would behave.
"Friendly AI" can also be used as a shorthand for Friendly AI theory, the field of knowledge concerned with building such an AI. Note that "Friendly" (with a capital "F") is being used as a term of art, referring specifically to AIs that promote humane values. An FAI need not be "friendly" in the conventional sense of being personable, compassionate, or fun to hang out with. Indeed, an FAI need not even be sentient.
An AI that underwent an intelligence explosion could exert unprecedented optimization power over its future. Therefore a Friendly AI could very well create an unimaginably good future, of the sort described in fun theory.
However, the fact that an AI has the ability to do something doesn't mean that it will make use of this ability. Yudkowsky's Five Theses suggest that a recursively self-improving AGI could quickly become a superintelligence, and that most such superintelligences will have convergent instrumental reasons to endanger humanity and its interests. So while building a Friendly superintelligence seems possible, building a superintelligence will generally result instead in an Unfriendly AI, a powerful optimization process that optimizes for extremely harmful outcomes. An Unfriendly AI could represent an existential risk even if it destroys humans, not out of hostility, but as a side effect of trying to do something entirely different.
Not all AGIs are Friendly or Unfriendly:
However, the orthogonality and convergent instrumental goals theses give reason to think that the vast majority of possible superintelligences will be Unfriendly.
Requiring Friendliness makes the AGI problem significantly harder, because 'Friendly AI' is a much narrower class than 'AI'. Most approaches to AGI aren't amenable to implementing precise goals, and so don't even constitute subprojects for FAI, leading to Unfriendly AI as the only possible 'successful' outcome. Specifying Friendliness also presents unique technical challenges: humane values are very complex; a lot of seemingly simple-sounding normative concepts conceal hidden complexity; and locating encodings of human values in the physical world seems impossible to do in any direct way. It will likely be technologically impossible to specify humane values by explicitly programming them in; if so, then FAI calls for a technique for generating such values automatically.
An open problem in Friendly AI (OPFAI) is a problem in mathematics, computer science, or philosophy of AI that needs to be solved in order to build a Friendly AI, and plausibly doesn't need to be solved in order to build a superintelligence with unspecified, 'random' values. Open problems include: