This is an article in the featured articles series from AISafety.info. AISafety.info writes AI safety intro content. We'd appreciate any feedback.
The most up-to-date version of this article is on our website, along with 300+ other articles on AI existential safety.
Instrumental convergence is the idea that sufficiently advanced intelligent systems with a wide variety of terminal goals would pursue very similar instrumental goals.
A terminal goal (also referred to as an "intrinsic goal" or "intrinsic value") is something that an agent values for its own sake (an "end in itself"), while an instrumental goal is something that an agent pursues to make it more likely that it will achieve its terminal goals (a "means to an end").
For instance, you might donate to an organization that helps the poor in order to improve people’s well-being. Here, “improve well-being” is a terminal goal that you value for its own sake, whereas “donate” is an instrumental goal that you value because it helps you achieve your terminal goal: if you found out that your money wasn’t making people better off, you’d stop donating.
While certain instrumental goals are particular to specific ends (e.g., filling a cup of water to quench your thirst), other instrumental goals are broadly useful. For example, if we imagine an AI with a very specific (and weird) terminal goal — to create as many paperclips as possible — we can see why this goal might lead to the AI pursuing a number of instrumental goals:[1]
Cognitive enhancement. Improvements in rationality and intelligence will improve the AI’s decision-making, making it faster at making paperclips.
We can see some degree of instrumental convergence among humans: people want many different things, but often converge on the same broadly-useful instrumental goals like "making money" or “going to college”.
Nick Bostrom, "The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents" (2012).