You are viewing version 1.18.0 of this page. Click here to view the latest version.

AI alignment

Edited by Eliezer Yudkowsky, et al. last updated 17th Feb 2025

You are viewing revision 1.18.0, last edited by Eliezer Yudkowsky

The "alignment problem for advanced agents" or "AI alignment" is the overarching research topic of how to develop sufficiently advanced machine intelligences such that running them produces good real-world outcomes. Both 'advanced agent' and 'good' should be understood as metasyntactic placeholders for much larger, ongoing debates.

Other terms that have been used to describe this research problem include "robust and beneficial AI" and "Friendly AI". The term "value alignment problem" was coined by Stuart Russell to refer to the primary subproblem of aligning AI preferences with (potentially idealized) human preferences.

A good introductory article or survey paper for this field does not presently exist. If you have no idea what this problem is about, consider reading Nick Bostrom's popular book Superintelligence.

You can explore this Arbital domain by following this link. See also the List of Value Alignment Topics on Arbital although this is not up-to-date.

Children:

Relevant powerful agent

Relevant limited AI

and 44 more

LESSWRONG
LW

LESSWRONG
LW

AI alignment

AI alignment