The Multiple Names of Beneficial AI


When signing-up to my AI Safety Meetup, applicants get to state what their interest in AI Safety is:

  • serious existential threat (Benjamin)
  • possible threat for human beings (AR)
  • AI research and ethics (Jeromes)
  • value learning - intelligence dynamics - preventing suboptimal future scenarios - sentient simulation (Guillaume)
  • self driving car (Kindi)
  • military arms races (Aidan)
  • making a personal assistant (Gunavadh)
  • adversarial examples in Deep Learning (Grégory)
  • GDPR Laws for Chatbots (Guillaume)
  • existential risks in general (Tim)
  • Effective Altruism (Caroline)
  • technical AI safety research (Dima)
  • drone targeting (Bobby)

When I talk about AI Safety, the problem (P) I have in mind is the following:

If humans keep improving their technology, they will end up building an AI smarter than them. From this point onwards, humans will not be in charge. Can humans ensure a positive outcome for humanity?

Different Names-Different Problems

I use the term "AI safety" because this is the one I am the most familiar with, and I feel that random people seem to grasp quickly what I mean.

They know I'm talking about AI because, well, there's "AI" in it. And when they hear "safety" they know some kind of "security" is involved.

That's why, for some people (half of the members of my Meetup), AI Safety relates to data protection, self-driving car, or automation in general. In other words, they think about AI Ethics. The latter will gain more and more importance in the coming years. Yet, people tend to underestimate the existential risks associated with AI progress or invent their own terminology to talk about it.


On March 29th, at the AIForHumanity summit about France's AI strategy, Sam Altman (co-chairman of openAI) discussed with Antoine Petit (director of CNRS, the main scientific research institution in France) about risks associated with AI.

Altman expressed that "the rate of change is so fast, and the things that are going to be possible are happening so quickly, that [he] worr[ies] about the ability for our governements to keep up". In other words, he is uncertain about the legal-systems' ability to follow the rate of progress in AI.

The fears of Petit, director of research in a not-leader-in-AI country, was utterly different. He grumbled: "We have to be careful, France and Europe should not specialize in ethical questions while at the same time US and China do business and create jobs".

Just before, at the same summit, Fei-Fei Li, large contributor to the AI4all non-profit, claimed that AI must be more inclusive and democratic.

In short, everyone had their different cause, and tried to insert their own wording.

Same Problem with different Wording


Fei-Fei Li also tried to coin the term "Human-centered AI". This is the name a research group at MIT chose to call their research on building AI "that learn from and collaborate with humans in a deep, meaningful way".

In a Ted Talk, Stuart Russell exposes what he calls Human-Compatible AI. According to him, it is a new form of AI which would rely heavily on learning from observing humans, while being uncertain about the values it infers.

This approach tries to solve the problem (P) by value-learning from human observation. Value-learning, or more generally aligning values, seem to be the preferred terminology for AI Safety researchers.

What Researchers Say

Here is what Paul Christiano has to say about his use of "Alignment":

'I adopted the new terminology after some people expressed concern with “the control problem.” There is also a slight difference in meaning: the control problem is about coping with the possibility that an AI would have different preferences from its operator. Alignment is a particular approach to that problem, namely avoiding the preference divergence altogether (so excluding techniques like “put the AI in a really secure box so it can’t cause any trouble”). There currently seems to be a tentative consensus in favor of this approach to the control problem.' (Medium)

Although "AI Safety" seems to make consensus for the name of the field, specialized researchers prefer to discuss "the Alignment problem".

Simplifying and Metaphors

Yesterday night, at an EA Meetup about AI risks, some participants did not consider any AI existential risk before 2500. It made me realize how little people know about the control problem, and how difficult it is to correctly communicate about it.

I then looked-up how non-profits expressed their missions on their websites:

  • Most benefits of civilization stem from intelligence, so how can we enhance these benefits with artificial intelligence without being replaced on the job market and perhaps altogether? (future of life)

  • OpenAI is a non-profit AI research company, discovering and enacting the path to safe artificial general intelligence. (OpenAI on twitter)

  • We do foundational mathematical research to ensure smarter-than-human artificial intelligence has a positive impact. (MIRI on

Such mission-statements need to be clear, concise, but also hint at what the real problems are.

I found that in Ted Talks, in particular Stuart Russel's one, the solution to precise but concise formulation was metaphors:

  • The Gorilla problem: building an intelligence smarter than us.

  • The King Mindas Problem: stating an objective which is not in fact truly-aligned with what we want.

  • the coffee (off-switch) problem: you can't fetch the coffee if you're dead.


  • Most of people are not aware of existential risks in AI, and only seem to be concerned with AI Ethics

  • Different phrasings are being introduced to tackle different social issues

  • There appears to be a consensus about the "value-learning"/"alignment" among AI Safety researchers

  • Simpler formulations/metaphors are key to communicate to the great majority, while hinting at the technical problems