A Friendly Artificial Intelligence (Friendly AI, or FAI) is a superintelligence (i.e., a really powerful optimization process) that produces good, beneficial outcomes rather than harmful ones. The term was coined by Eliezer Yudkowsky, so it is frequently associated with ~~Yudkowsky'~~Yudkowsky's proposals for how an artificial general intelligence (AGI) of this sort would behave.

However, the fact that an AI has the ability to do something ~~doesn'~~doesn't mean that it will make use of this ability. ~~Yudkowsky'~~Yudkowsky's Five Theses suggest that a recursively self-improving AGI could quickly become a superintelligence, and that most such superintelligences will have convergent instrumental reasons to endanger humanity and its interests. So while building a Friendly superintelligence seems possible, building a superintelligence will generally result instead in an Unfriendly AI, a powerful optimization process that optimizes for extremely harmful outcomes. An Unfriendly AI could represent an existential risk even if it destroys humans, not out of hostility, but as a side effect of trying to do something entirely different.

Some AGIs may be too weak to qualify as superintelligences. We could call these ~~'approximately~~'approximately human-level ~~AIs'~~AIs'. Designing safety protocols for narrow AIs and arguably even for weak, non-self-modifying AGIs is primarily a machine ethics problem outside the purview of Friendly AI - although some have argued that even human-level AGIs may present serious safety risks.
Some AGIs (e.g., hypothetical safe Oracle AIs) may not optimize strongly and consistently for harmful or beneficial outcomes, or may only do so contingent on how ~~they'~~they're used by human operators.
Some AGIs may be on a self-modification trajectory that will eventually make them Friendly, but are dangerous at present. Calling them ~~'Friendly'~~'Friendly' or ~~'Unfriendly'~~'Unfriendly' would neglect their temporal inconsistency, so ''Proto-Friendly ~~AI'~~AI' is a better term here.

Requiring Friendliness makes the AGI problem significantly harder, because ~~'Friendly AI'~~'Friendly AI' is a much narrower class than ~~'AI'~~'AI'. Most approaches to AGI ~~aren'~~aren't amenable to implementing precise goals, and so ~~don'~~don't even constitute subprojects for FAI, leading to Unfriendly AI as the only possible ~~'successful'~~'successful' outcome. Specifying Friendliness also presents unique technical challenges: humane values are very complex; a lot of seemingly simple-sounding normative concepts conceal hidden complexity; and locating encodings of human values in the physical world seems impossible to do in any direct way. It will likely be technologically impossible to specify humane values by explicitly programming them in; if so, then FAI calls for a technique for generating such values automatically.

An open problem in Friendly AI (OPFAI) is a problem in mathematics, computer science, or philosophy of AI that needs to be solved in order to build a Friendly AI, and plausibly ~~doesn'~~doesn't need to be solved in order to build a superintelligence with unspecified, ~~'random'~~'random' values. Open problems include:

Eliezer S. Yudkowsky (2008). ~~"Artificial~~"Artificial Intelligence as a Positive and Negative Factor in Global ~~Risk"~~Risk". Global Catastrophic Risks. Oxford University Press. ~~([1])~~
Cindy Mason (2015). ~~"Engineering~~"Engineering Kindness: Building A Machine With Compassionate ~~Intelligence"~~Intelligence". International Journal of Synthetic Emotion. ([([2])

Note for readers: the last substantial update of the content below dates back to 2014 and is severely outdated. The term of "friendly AI" is no longer used by current research, replaced by "AI alignment" from around 2015.

Some AGIs may be too weak to qualify as superintelligences. We could call these ~~'approximately~~'approximately human-level ~~AIs'~~AIs'. Designing safety protocols for narrow AIs and arguably even for weak, non-self-modifying AGIs is primarily a machine ethics problem outside the purview of Friendly AI - although some have argued that even human-level AGIs may present serious safety risks.
Some AGIs (e.g., hypothetical safe Oracle AIs) may not optimize strongly and consistently for harmful or beneficial outcomes, or may only do so contingent on how ~~they'~~they're used by human operators.
Some AGIs may be on a self-modification trajectory that will eventually make them Friendly, but are dangerous at present. Calling them ~~'Friendly'~~'Friendly' or ~~'Unfriendly'~~'Unfriendly' would neglect their temporal inconsistency, so ''Proto-Friendly ~~AI'~~AI' is a better term here.

An open problem in Friendly AI (OPFAI) is a problem in mathematics, computer science, or philosophy of AI that needs to be solved in order to build a Friendly AI, and plausibly ~~doesn'~~doesn't need to be solved in order to build a superintelligence with unspecified, ~~'random'~~'random' values. Open problems include:

Links

References

Some AGIs may be too weak to qualify as superintelligences. We could call these 'approximately human-level AIs'. Designing safety protocols for narrow AIs and arguably even for weak, non-self-modifying AGIs is primarily a machine ethics problem outside the purview of Friendly ~~AI.~~AI - although some have argued that even human-level AGIs may present serious safety risks.
Some AGIs (e.g., hypothetical safe Oracle AIs) may not optimize strongly and consistently for harmful or beneficial outcomes, or may only do so contingent on how they're used by human operators.
Some AGIs may be on a self-modification trajectory that will eventually make them Friendly, but are dangerous at present. Calling them 'Friendly' or 'Unfriendly' would neglect their temporal inconsistency, so 'Proto-Friendly AI' is a better term here.

However, the fact that an AI has the ability to do something doesn't mean that it will make use of this ability. Yudkowsky's Five Theses suggest that a recursively self-improving AGI could quickly become a superintelligence, and that most such superintelligences ~~would~~will have convergent instrumental reasons to endanger humanity and its ~~interests (assuming its values do not evolve into Frienly ones as part of its recursive self-improvement).~~interests. So while building a Friendly superintelligence seems possible, building a superintelligence ~~will, under these assumptions,~~will generally result instead in an Unfriendly AI, a powerful optimization process that optimizes for extremely harmful outcomes. An Unfriendly AI could represent an existential risk even if it destroys humans, not out of hostility, but as a side effect of trying to do something entirely different.

However, the orthogonality and convergent instrumental goals theses give reason to think that the vast majority of ~~theoretically~~ possible superintelligences will be Unfriendly. ~~On the other hand, the~~ ~~generalized anti-zombie principle~~ ~~gives good reason to distingusih conceptual possibility from real likelihood.~~

Requiring Friendliness makes the AGI problem significantly harder, because 'Friendly AI' is a much narrower class than 'AI'. Most approaches to AGI aren't amenable to implementing precise goals, and so don't even constitute subprojects for FAI, leading to Unfriendly AI ~~(either actually dangersous, or no-fun-0but-no-threat)~~ as the only possible 'successful' ~~outcomes. Assuming that~~outcome. Specifying Friendliness ~~requires understanding of all human values, pecifying it will~~ also ~~present~~presents unique technical ~~challenges. Humane~~challenges: humane values are very complex; a lot of seemingly simple-sounding normative concepts conceal hidden complexity; and locating encodings of human values in the physical world seems impossible to do in any direct way. It will likely be technologically impossible to specify humane values by explicitly programming them in; if so, then FAI calls for a technique for generating such values automatically.

			v1.63.0Oct 3rd 2020 GMT	(+50)
			v1.62.0Sep 22nd 2020 GMT	(+200/-206) copied over references
			v1.61.0Sep 22nd 2020 GMT	(+317)
			v1.60.0Sep 14th 2020 GMT	(+405/-200)
			v1.59.0Jun 21st 2017 GMT
			v1.58.0Jun 17th 2017 GMT
			v1.57.0Aug 27th 2016 GMT
			v1.56.0Mar 12th 2016 GMT	(+122/-3) /* AI risk */
			v1.55.0Oct 2nd 2015 GMT	(+3) new reference Engineering Kindness
			v1.54.0Feb 1st 2014 GMT	(+73/-455)

			v1.63.0Oct 3rd 2020 GMT	(+50)
			v1.62.0Sep 22nd 2020 GMT	(+200/-206) copied over references
			v1.61.0Sep 22nd 2020 GMT	(+317)
			v1.60.0Sep 14th 2020 GMT	(+405/-200)
			v1.59.0Jun 21st 2017 GMT
			v1.58.0Jun 17th 2017 GMT
			v1.57.0Aug 27th 2016 GMT
			v1.56.0Mar 12th 2016 GMT	(+122/-3) /* AI risk */
			v1.55.0Oct 2nd 2015 GMT	(+3) new reference Engineering Kindness
			v1.54.0Feb 1st 2014 GMT	(+73/-455)