LESSWRONG
LW

Wikitags

Friendly Artificial Intelligence

Edited by Rob Bensinger, Zack_M_Davis, Grognor, Vladimir_Nesov, Swimmer963 (Miranda Dixon-Luinenburg), et al. last updated 3rd Oct 2020

Note for readers: the last substantial update of the content below dates back to 2014 and is severely outdated. The term of "friendly AI" is no longer used by current research, replaced by "" from around 2015. This new term is also the subject of much debate.

A Friendly Artificial Intelligence (Friendly AI, or FAI) is a (i.e., a ) that produces good, beneficial outcomes rather than harmful ones. The term was coined by Eliezer Yudkowsky, so it is frequently associated with Yudkowsky's proposals for how an (AGI) of this sort would behave.

"Friendly AI" can also be used as a shorthand for Friendly AI theory, the field of knowledge concerned with building such an AI. Note that "Friendly" (with a capital "F") is being used as a term of art, referring specifically to AIs that promote humane values. An FAI need not be "friendly" in the conventional sense of being personable, compassionate, or fun to hang out with. Indeed, an FAI need not even be sentient.

AI risk

An AI that underwent an could exert unprecedented power over its future. Therefore a Friendly AI could very well create an unimaginably good future, of the sort described in .

However, the fact that an AI has the ability to do something doesn't mean that it will . Yudkowsky's Five Theses suggest that a AGI could quickly become a superintelligence, and that most such superintelligences will have to endanger humanity and its interests. So while building a Friendly superintelligence seems possible, building a superintelligence will generally result instead in an , a powerful optimization process that optimizes for extremely harmful outcomes. An Unfriendly AI could represent an even if it destroys humans, not out of hostility, but as a side effect of trying to do something .

Not all AGIs are Friendly or Unfriendly:

  1. Some AGIs may be too weak to qualify as superintelligences. We could call these 'approximately human-level AIs'. Designing safety protocols for narrow AIs and arguably even for weak, non-self-modifying AGIs is primarily a problem outside the purview of Friendly AI - although some have argued that even human-level AGIs may present serious safety risks.
  2. Some AGIs (e.g., hypothetical safe ) may not optimize strongly and consistently for harmful or beneficial outcomes, or may only do so contingent on how they're used by human operators.
  3. Some AGIs may be on a self-modification trajectory that will eventually make them Friendly, but are dangerous at present. Calling them 'Friendly' or 'Unfriendly' would neglect their temporal inconsistency, so 'Proto-Friendly AI' is a better term here.

However, the and convergent instrumental goals theses give reason to think that the vast majority of possible superintelligences will be Unfriendly.

Requiring Friendliness makes the AGI problem significantly harder, because 'Friendly AI' is a much narrower class than 'AI'. Most approaches to AGI aren't amenable to implementing precise goals, and so don't even constitute subprojects for FAI, leading to Unfriendly AI as the only possible 'successful' outcome. Specifying Friendliness also presents unique technical challenges: humane values are ; a lot of conceal hidden complexity; and locating encodings of human values seems impossible to do in any direct way. It will likely be technologically impossible to specify humane values by explicitly programming them in; if so, then FAI calls for a technique for generating such values automatically.

Open problems

An open problem in Friendly AI (OPFAI) is a problem in mathematics, computer science, or philosophy of AI that needs to be solved in order to build a Friendly AI, and plausibly doesn't need to be solved in order to build a superintelligence with unspecified, 'random' values. Open problems include:

  1. Pascal's mugging / Pascal's muggle
  2. Self-modification and Löb's Theorem

Links

Blog posts

  • Artificial Mysterious Intelligence
  • Not Taking Over the World
  • Amputation of Destiny
  • Free to Optimize
  • Nonparametric Ethics
  • Hacking the CEV for Fun and Profit by Wei Dai
  • Metaphilosophical Mysteries by Wei Dai
  • The Urgent Meta-Ethics of Friendly Artificial Intelligence by lukeprog

External links

  • About Friendly AI
  • 14 objections against AI/Friendly AI/The Singularity answered by Kaj Sotala
  • "Proof" of Friendliness by Paul F. Christiano

See also

  • Technological singularity,
  • ,

References

  • Eliezer S. Yudkowsky (2008). "Artificial Intelligence as a Positive and Negative Factor in Global Risk". Global Catastrophic Risks. Oxford University Press.
  • Cindy Mason (2015). "Engineering Kindness: Building A Machine With Compassionate Intelligence". International Journal of Synthetic Emotion. ([2])
artificial general intelligence
Artificial general intelligence
intelligence explosion
intelligence explosion
really powerful optimization process
AI alignment
superintelligence
Discussion2
Discussion2
Oracle AIs
make use of this ability
Detached lever fallacy
recursively self-improving
Naturalized induction
seemingly simple-sounding normative concepts
Magical categories
fun theory
Fun theory
Unfriendly AI
Unfriendly artificial intelligence
machine ethics
existential risk
optimization
entirely different
paperclip maximizer
in the physical world
very complex
Complexity of value
convergent instrumental reasons
orthogonality