The "alignment problem for advanced agents" or "AI alignment" is the overarching research topic of how to develop sufficiently advanced machine intelligences such that running them produces good real-world outcomes. outcomes in the real world.
Both 'advanced agent' and 'good' should be understood as metasyntactic placeholders for much larger, ongoing debates.complicated ideas still under debate. The term 'alignment' is intended to convey the idea of pointing an AI in a direction--just like, once you build a rocket, it has to be pointed in a particular direction.
Some alternative terms for this general field of study, such as 'control problem', can sound adversarial--like the rocket is already pointed in a bad direction and you need to wrestle with it. Other terms, like 'AI safety', understate the advocated degree to which alignment ought to be an intrinsic part of building advanced agents. E.g., there isn't a separate theory of "bridge safety" for how to build bridges that don't fall down. Pointing the agent in a particular direction ought to be seen as part of the standard problem of building an advanced machine agent. The problem does not divide into "building an advanced AI" and then separately "somehow causing that AI to produce good outcomes", the problem is "getting good outcomes via building a cognitive agent that brings about those good outcomes".
The "alignment problem for advanced agents"Value alignment theory" or "AI alignment theory"alignment" is the overarching research topic of how to develop a highly sufficiently advanced machine intelligences Artificial Intelligence such that running this AIthem produces good real-world outcomes.
Other terms that have been used to describe this research subject areproblem include "robust and beneficial AGI"AI" and "Friendly AI theory"AI".
Where The term "value alignment problem" was coined by Stuart Russell to go first if you're just coming in and want to poke around: List of Value Alignment Topics, or start browsing from children of this page.
primary subproblem of aligning AI preferences with (potentially idealized) human preferences.
Introductory articles don't exist yet. Meanwhile, ifA good introductory article or survey paper for this field does not presently exist. If you have no idea what this problem is all about, tryconsider reading Nick Bostrom's popular book Superintelligence.
You can explore this Arbital domain by following this link. See also the List of Value Alignment Topics on Arbital although this is not up-to-date.
"AI alignment theory" is meant as an overarching term to cover the whole research field associated with this problem, including, e.g., the much-debated attempt to estimate how rapidly an AI might gain in capability once it goes over various particular thresholds.
If you're willing to spend time on learning this field and are not previously familiar with the basics of decision theory and probability theory, it's worth reading the Arbital introductions to those first. In particular, it may be useful to become familiar with the notion of priors and belief revision, and with the coherence arguments for expected utility.
If you have time to read a textbook to gain general familiarity with AI, "Artificial Intelligence: A Modern Approach" is highly recommended.
The "alignment problem for advanced agents" or "AI alignment" is the overarching research topic of how to develop sufficiently advanced machine intelligences such that running them produces good real-world outcomes. Both 'advanced agent' and 'good' should be understood as metasyntactic placeholders for much larger, ongoing debates.
Other terms that have been used to describe this research problem include "robust and beneficial AI" and "Friendly AI". The term "value"value alignment problem"problem" was coined by Stuart Russell to refer to the primary subproblem of aligning AI preferences with (potentially idealized) human preferences.
For the definition of 'value alignment' as contrasted to 'value identification' or 'value achievement', see the page on value alignment problem. For the definition of 'value' as a metasyntactic placeholder for "the still-debated thing we want our AIs to accomplish", see the page on value.
If you're willing to spend time on learning this field and are not previously familiar with the basics of decision theory and probability theory, it's worth reading the Arbital introductions to those first. In particular, it may be useful to become familiar with the notion of priors and belief revision, and with the coherence arguments for expected utility.
If you have time to read a textbook to gain general familiarity with AI, "Artificial Intelligence: A Modern Approach" is highly recommended.
Treat human monitoring as expensive, unreliable, and fragile.
"Value alignment theory" or "AI alignment theory" is the overarching research topic of how to develop a highly advanced Artificial Intelligence such that running this AI produces good real-world outcomes. Other terms that have been used to describe this research subject are "robust"robust and beneficial SuperintelligenceAGI" and "Friendly AI theory".
Where to go first if you're just coming in and want to poke around: List of Value Alignment Topics., or start browsing from children of this page.
Introductory articles don't exist yet. Meanwhile, if you have no idea what this is all about, try reading Nick Bostrom's book Superintelligence.Superintelligence.
If you're willing to spend time on learning this field and are not previously familiar with the basics of decision theory and probability theory, it's worth reading the Arbital introductions to those first. In particular, it may be useful to become familiar with the notion of priors and belief revision, and with the coherence arguments for expected utility.
If you have time to read a textbook to gain general familiarity with AI, "Artificial Intelligence: A Modern Approach" is highly recommended.
Treat human monitoring as expensive, unreliable, and fragile.