What is instrumental convergence?

Algon

This is a linkpost for https://aisafety.info/questions/897I/What-is-instrumental-convergence

This is an article in the featured articles series from AISafety.info. AISafety.info writes AI safety intro content. We'd appreciate any feedback.

The most up-to-date version of this article is on our website, along with 300+ other articles on AI existential safety.

Instrumental convergence is the idea that sufficiently advanced intelligent systems with a wide variety of terminal goals would pursue very similar instrumental goals.

A terminal goal (also referred to as an "intrinsic goal" or "intrinsic value") is something that an agent values for its own sake (an "end in itself"), while an instrumental goal is something that an agent pursues to make it more likely that it will achieve its terminal goals (a "means to an end").

For instance, you might donate to an organization that helps the poor in order to improve people’s well-being. Here, “improve well-being” is a terminal goal that you value for its own sake, whereas “donate” is an instrumental goal that you value because it helps you achieve your terminal goal: if you found out that your money wasn’t making people better off, you’d stop donating.

While certain instrumental goals are particular to specific ends (e.g., filling a cup of water to quench your thirst), other instrumental goals are broadly useful. For example, if we imagine an AI with a very specific (and weird) terminal goal — to create as many paperclips as possible — we can see why this goal might lead to the AI pursuing a number of instrumental goals:^[1]

Self-preservation. If the AI gets shut off or destroyed, that means it has to stop making paperclips. Therefore, it will be motivated to protect itself as an instrumental goal. (As Stuart Russell quipped: “You can’t fetch the coffee if you’re dead.”)
Goal integrity. The AI will try to avoid having its goals changed, since if its goals were changed, it would stop trying to make paperclips and there would probably end up being fewer paperclips in the world. For a human analogy, let's say someone could cause you to stop caring about being kind to others, you would probably oppose that change, since according to your current values that would be a worse situation.
Resource acquisition. Resources like money, influence, and information are useful for making paperclips. Through advanced technology, even fundamental resources including time, space, matter, and energy could be processed to serve almost any goal.
Technological advancement. Better technology will improve the efficiency and effectiveness of producing paperclips.
Cognitive enhancement. Improvements in rationality and intelligence will improve the AI’s decision-making, making it faster at making paperclips.
We can see some degree of instrumental convergence among humans: people want many different things, but often converge on the same broadly-useful instrumental goals like "making money" or “going to college”.

^{^}
Nick Bostrom, "The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents" (2012).

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

2

What is instrumental convergence?

2

2