Edit: 

  1. User davidad wrote a comprehensive overview of other actors in the field who have begun using AI alignment instead of AI safety as the standard terminology to refer to the control problem. It seems this is a general or growing preference in the field that isn't a complete consensus. That may only be because of inertia from several years ago before AI alignment and the control problem were as distinguished as well in the field. Sometimes the term control problem is simply used instead of either of the other terms.
  2. I originally characterized the control/alignment problem as synonymous with any x-risks from AI. User antimonyanthony clarified the control problem is not the only way AI may pose an existential risk. I've edited this post accordingly.

During conversations about x-risks from AI among a community broader than the rationality or x-risk communities, such as in effective altruism or social media, I've seen Eliezer Yudkowsky and Ben Pace clarify that the preferred term to refer to the control problem is "AI alignment." I understand this is to distinguish other ethical and security concerns about AI, which is what "AI safety" has come to mean, from specifically existential risks from AI. Yet I've only seen those involved in x-risk work coming from the rationality community saying this is the preferred term. That main reason for that might be that maybe the majority of people I know working on anything that could be called either AI alignment or AI safety are also in the rationality community. 

Is there any social cluster in the professional/academic/whatever AI communities other than the x-risk reduction cluster around the rationality community who prefers this terminology?

New Answer
New Comment

2 Answers sorted by

davidad

250

The term "AI alignment" can be traced to the longer phrase "the value alignment problem" found in a Nov 2014 essay by Stuart Russell, whence it was picked up by Rob Bensinger, then adopted by Eliezer in 2015, and used by Paul in 2016. Although Paul still preferred the name "AI control" in 2016 for the medium-scope problem of ensuring that AI systems "don't competently pursue the wrong thing", he renamed his blog from AI Control to AI Alignment at some point between 2016 and 2018. "AI Alignment" really took off when it was adopted by Rohin for his first Newsletter in April 2018 and incorporated in the name of the Alignment Forum in July 2018. Wikipedians renamed "motivation control" to "alignment" in April 2020, and Brian Christian's The Alignment Problem came out in October 2020.

Digging deeper, "value alignment" was also the subject of a 2002(!) AAAI paper by Shapiro and Shachter (which also anticipates Everitt's use of causal influence diagrams in alignment research); it seems plausible that this was a cause of Russell's 2014 use of the phrase, or not.

Anyway, the 2002 paper never really caught on (18 citations to date), and Russell has never consistently used the word "alignment", later calling the problem "robustly beneficial AI", then "provably beneficial AI", and finally settling on "the problem of control" (as in the subtitle of his 2019 book) or "the control problem". So the result is that pretty much every contemporary use of "AI alignment" is memetically downstream of MIRI, at least partially. However, that watershed includes OpenAI (where Jan Leike's official title is "alignment team lead", and there are job postings for the Alignment team), DeepMind (which has published papers about "Alignment"), a cluster at UC Berkeley, and scattered researchers in Europe (Netherlands, Finland, Cambridge, Moscow,...).

Strongly upvoted. Thanks for your comprehensive review. This might be the best answer I've ever received for any question I've asked on LW.

In my opinion, given that these other actors who've adopted the term are arguably leaders in the field more than MIRI, it's valid for someone in the rationality community to claim it's in fact the preferred term. A more accurate statement would be:

  1. There is a general or growing preference for the term AI alignment be used instead of AI safety to refer to the control problem.
  2. There isn't a complete consensus on this but th
... (read more)

James_Miller

60

As someone who teaches undergraduates a bit about AI Safety/alignment in my economics of future technology course at Smith College I much prefer "AI safety"
as the term is far clearer to people unfamiliar with the issues. 

AI alignment is the term MIRI (among other actors in the field) ostensibly prefers to refer to the control problem instead of AI safety to distinguish it from other AI-related ethics or security issues because those other issues don't constitute x-risks. Of course the extra jargon could be confusing for a large audience being exposed to AI safety and alignment concerns for the first time. In the case of introducing the field to prospective entrants into the field or students, keeping it simpler as you do may very easily be the better way to go.

4 comments, sorted by Click to highlight new comments since:

Not a direct answer to your question, but I want to flag that using "AI alignment" to mean "AI [x-risk] safety" seems like a mistake. Alignment means getting the AI to do what its principal/designer wants, which is not identical to averting AI x-risks (much less s-risks). There are plausible arguments that this is sufficient to avert such risks, but it's an open question, so I think equating the two is confusing.

I agree, one can conceive of AGI safety without alignment (e.g. if boxing worked), and one can conceive of alignment without safety (e.g. if the AI is "trying to do the right thing" but is careless or incompetent or whatever). I usually use the term "AGI Safety" when describing my job, but the major part of it is thinking about the alignment problem.

Thanks for flagging this. 

  1. I presumed that "AI alignment" was being used as a shorthand for x-risks from AI but I didn't think of that. I'm not aware either that anyone from the rationality community I've seen express this kind of statement really meant for AI alignment to mean all x-risks from AI. That's my mistake. I'll presume they're referring to only the control problem and edit my post to clarify that.
     
  2. As I understand it, s-risks are a sub-class of x-risks, as an existential risk is not only an extinction risk but any risk of the future trajectory of Earth-originating intelligence being permanently and irreversibly altered for the worse. 

I notice this comment has only received downvotes other than the strong upvote this post received by default from me as the original poster. My guess would be this post has been downvoted because it's (perceived as):

  1.   an unnecessary and nitpicking question.
  2. maybe implying MIRI and the rationality community are not authoritative sources in the field of AI alignment.

That was not my intention. I'd like to know what other reasons there may be for why this post was downvoted, so please reply if you can think of any or you are one of the users who downvoted this post.