When you select policy πk because you expect it to achieve a later state Yk (the(the "goal"), we say that πk is your instrumental strategy for achieving Yk. The observation of "instrumental convergence" is that a widely different range of Y-goals can lead into highly similar π-strategies. (This becomes truer as the Y-seeking agent becomes more instrumentally efficient; two very powerful chess engines are more likely to solve a humanly solvable chess problem the same way, compared to two weak chess engines whose individual quirks might result in idiosyncratic solutions.)
When you select strategy Xk because you expect it to achieve a later state Yk (the "goal"), we say that Xk is your instrumental strategy for achieving Yk. The observation of "instrumental convergence" is that a widely different range of Y-goals can lead into highly similar X-strategies. (This becomes truer as the Yk-seeking agent becomes more instrumentally efficient; two very powerful chess engines are more likely to solve a humanly solvable chess problem the same way, compared to two weak chess engines whose individual quirks might result in idiosyncratic solutions.)
If you'd probably want to do some kind of Xk whether you were a superintelligent paperclip maximizer, or a superintelligent diamond maximizer, or a superintelligence that just wanted to keep a single button pressed for as long as possible, then X may be a "convergent instrumental strategy".
Imagine the real world as an extremely complicated game. Imagine that at the very start of this game, a highly capable player must make a single binary choice between the abstract moves "Gather more resources later" and "Never gather any more resources later". Vingean uncertainty or not, we seem justified in putting a pretty large amount of probability mass on the first move being preferred - a binary choice is simple enough that we need not be at a loss to guess the optimal play. This would be a good guess even if the player's utility function Uk over outcomes o had been randomly drawn from some computational language for specifying utility functions. (Providing that the agent is otherwise good at figuring out which policies will achieve its goals.)
The appearance of an "instrumental strategy" can be seen as a kind of implicit phenomenon in repeatedly choosing actions xk that lead into a final state Yik. There doesn't have to be a special X-strategy-module which repeatedly does X-actions regardless of whether or not they lead to Yik. There can just be a process that repeatedly checks for the action leading to the most Yik.
(This doesn't rule out that some special cases of AI development pathways might eventually tend to result in agents with a value function Ue which does decompose into some special variant Xe of X plus other terms Ve. For example, natural selection on organisms that spend a long period of time as non-consequentialist policy-reinforcement-learners, before they later evolve into consequentialists, has had results along these lines, in the case of humans. We now have an independent, separate "curiosity" drive, instead of just looking into information that might be valuable for pursuing our other goals.)
The reasoning for an instrumental convergence claim says, "Supposing a consequentialist with a goal Yik, that is reasoning about what...
ThisThat is: If we don't know whether a superintelligent agent is a paperclip maximizer or a diamond maximizer, we can still guess with some confidence that it will pursue a strategy in the general class "get more resources of matter, energy, and computation" rather than "don't get more resources". This is true even though Vinge's Law says that we won't be able to predict exactly how the superintelligence will go about gathering matter and energy (since we don't know optimal play in the domain "gather matter and energy").
Another way of viewing it:Imagine the real world as an extremely complicated game. Imagine that at the very start of this game, a rich and complicated real-life "game", thehighly capable player must make a single binary choice between the abstract moves "Gather more resources later" and "Never gather any more resources later". We want to predict which move is the optimal play on that first turn. Vingean uncertainty or not, we seem justified in putting a pretty large amount of probability mass on the first move being optimal.preferred - a binary choice is simple enough that we need not be at a loss to guess the optimal play. This would be truea good guess even if the agent'player's utility function U over outcomes o hashad been randomly drawn from some measure over a computational language for specifying utility functions (providingfunctions. (Providing that the agent is otherwise good at figuring out which policies will achieve its goals).goals.)
Distinguishing the advanced agent properties that seem probably required for an AI program to start exhibiting the sort of reasoning filed under "instrumental convergence", the most obvious candidates are:
That is: You don't automatically see "acquire more computing power" as a useful strategy unless you understand "I am a cognitive program and I tend to achieve more of my goals when I run on more resources." Alternatively, e.g., the programmers adding more computing power and the system's goals starting to be achieved better, after which related policies are positively reinforced and repeated, could arrive at a similar end via the pseudoconsequentialist idiom of policy reinforcement.
The advanced agent properties that would naturally or automatically lead to instrumental convergence seem well above the range of modern AI programs. As of 2016, current machine learning algorithms don't seem to be within the range where this predicted phenomenon should start to be visible.
Similarly, a claim of instrumental convergence on X can be ceteris paribus refuted by presenting some alternate narrow strategy Wj⊂¬X which seems to be more useful than any obvious strategy in X. We are then not positively confident of convergence on Wj, but we are much less confident ofshould assign very low probability to the alleged convergence on X, at least until somebody presents an X-exemplar with higher expected utility than Wj. If the proposed convergent strategy is "trade economically with other humans and their Artificial Intelligencesobey existing systems of property rights," and we see no way for Clippy to obtain 1055 paperclips under those rules, but we do think Clippy could get 1055 paperclips by expanding as fast as possible without violating contemporaryregard for human welfare or existing legal property rights, accepting that this lets you only obtain a fraction 10^{-7} of the reachable universe"systems, then we can ceteris paribus reject this claim"obey property rights" as convergent. Even if trading with humans to whatever extent we believe as fact that Clippy could in fact getmake paperclips produces more paperclips by wiping everyone out and taking overthan doing nothing, it may not produce the universe unilaterally.most paperclips compared to converting the material composing the humans into more efficient paperclip-making machinery.
In particular: instrumental strategies are not terminal values -. In fact, they have a type distinction from terminal values. So a widely useful instrumental strategy is not like a universally convincing utility function (which you probably shouldn't look for in the first place). "If you're going to spend resources on thinking about technology, try to do it earlier rather...
Imagine the real world as an extremely complicated game. Suppose that at the very start of this game, a highly capable player must make a single binary choice between the abstract moves "Gather more resources later" and "Never gather any more resources later". Vingean uncertainty or not, we seem justified in putting a large amount ofhigh probability mass on the first move being preferred - a binary choice is simple enough that we can take a good guess at the optimal play.
The appearance of an "instrumental strategy" can be seen as a kind of implicit phenomenon in repeatedly choosing actions xk that lead into a final state Yi. There doesn't have to be a special X-strategy-module which repeatedly does X-actions regardless of whether or not they lead to Yi. There can just be a process that repeatedly checks for the action leading to the most Yi..
(This doesn't rule out that some special cases of AI development pathways might eventually tend to result in agents with a value function Ue which does decompose into some special variant Xe of X plus other terms Ve.. For example, natural selection on organisms that spend a long period of time as non-consequentialist policy-reinforcement-learners, before they later evolve into consequentialists, has had results along these lines.lines, in the case of humans.)
Any claim about instrumental convergence says at most, "The vast majority of possible goals Y would convergently imply a strategy in X,, by default and unless otherwise averted by some special case Yi for which strategies in ¬X (not-X) are better."
Suppose you landed on a distant planet,planet and found a structure of giant metal pipes, crossed by occasional cables. Further investigation shows that the cables are electrical superconductors carrying high-voltage currents.
You might not know what the huge structure did. But you would nonetheless guess that this huge structure had been built by some intelligence, rather than being a naturally-occurring mineral formation - that there were aliens who built the structure for some purpose.
That is: We can take an enormous variety of compactly specifiable goals, like "travel to the other side of the universe" or "support biological life" or "make paperclips", and find very similar optimal strategies along the way. WeToday we don't actually know if electrical superconductors are actually the most useful way to transport energy in the limit of technology. But whatever is the most efficient way of transporting energy, whether that's electrical superconductors or something else, the most efficient form of that technology would probably not vary much depending on whether you were trying to make diamonds or make paperclips.
From a Bayesian standpoint this is how we can identify a huge machine strung with superconducting cables as having been produced by high-technology aliens, even before we have any idea of what the machine does. We're saying, "This looks like the product of optimization, a strategy X that the aliens chose to best achieve some unknown goal Y; we can infer this even without knowing Y because many possible Y-goals would concentrate probability into this X-strategy being used."
When you select strategypolicy Xπk because you expect it to achieve a later state Yk (the "goal"), we say that Xπk is your instrumental strategy for achieving Yk. The observation of "instrumental convergence" is that a widely different range of Y-goals can lead into highly similar Xπ-strategies. (This becomes truer as the Yk-seeking agent becomes more instrumentally efficient; two very powerful chess engines are more likely to solve a humanly solvable chess problem the same way, compared to two weak chess engines whose individual quirks might result in idiosyncratic solutions.)
If you'd probably want to do some kindthere's a simple way of classifying possible strategies Π into partitions X⊂Π and ¬X⊂Π (¬X = Π−X aka "not X") and you think that for most goals XYk whetherthe corresponding best policies πk are likely to be inside X, then you werethink X is a superintelligentpaperclip maximizer, or"convergent instrumental strategy".
In other words, if you think that a superintelligent paperclip maximizer, diamond maximizer, or a superintelligence that just wanted to keep a single button pressed for as long as possible, and a superintelligence optimizing for a flourishing intergalactic civilization filled with happy sapient beings, would all want to "transport...
X being "instrumentally convergent" doesn't mean that any mind needs an extra, independent drive to do X.
Consider the following line of reasoning: "It's impossible to get on an airplane without buying plane tickets. So anyone on an airplane must be a sort of person who enjoys buying plane tickets. If I offer them a plane ticket they'll probably buy it, because this is almost certainly somebody who has an independent motivational drive to buy plane tickets. There's just no way you can design an organism that ends up on an airplane unless it has a buying-tickets drive."
The appearance of an "instrumental strategy" can be seen as a kind of implicit phenomenon in repeatedly choosing actions xk that lead into a final state Yi. There doesn't have to be a special X-strategy-module which repeatedly does X-actions regardless of whether or not they lead to Yi. There can just be a process that repeatedly checks for the action leading to the most Yi.
The flaw in the argument about plane tickets is that human beings are consequentialists who buy plane tickets just because they wanted to go somewhere and they expected the action "buy the plane ticket" to have the consequence, in that particular case, of going to the particular place and time they wanted to go. No extra "buy the plane ticket" module is required, and especially not a plane-ticket-buyer that doesn't check whether there's any travel goal and whether buying the plane ticket leads into the desired later state.
More semiformally, suppose that Uk is the utility function of an agent and let πk be the policy it selects. If the agent is instrumentally efficient relative to us at achieving Uk, then from our perspective we can mostly reason about whatever kind of optimization it does as if it were expected utility maximization, i.e.:
πk=argmaxπi∈ΠE[Uk|πi]
When we say that X is instrumentally convergent, we are stating that it probably so happens that:
πk∈X
We are not making any claims along the lines that for an agent to thrive, its Uk must decompose into X plus a residual term Vk denoting the rest of the utility function. The claim is not that every agent must have a term in its utility function for X. The claim is that merely selecting strategies whose expected consequence is high in Uk tends to produce strategies somewhere in X. The fact that πk∈X is an epiphenomenon of merely optimizing for a Uk that makes no mention of X.
(This doesn't rule out that some special cases of AI development pathways might eventually tend to result in agents with a value function Ue which does decompose into some special...
If there's a simple way of classifying possible strategies Π into partitions X⊂Π and ¬X⊂Π (¬X = Π−X aka "not X"), and you think that for most compactly describable goals Yk the corresponding best policies πk are likely to be inside X, then you think X is a "convergent instrumental strategy".
In other words, if you think that a superintelligent paperclip maximizer, diamond maximizer, a superintelligence that just wanted to keep a single button pressed for as long as possible, and a superintelligence optimizing for a flourishing intergalactic civilization filled with happy sapient beings, would all want to "transport matter and energy efficiently", in order to achieve their other goals, then you think "transport matter and energy efficiently" is a convergent instrumental strategy.
In this case "paperclips", "diamonds", "keeping a button pressed as long as possible", and "sapient beings having fun", would be the goals Y1,Y2,Y3,Y4. The corresponding best strategies π1,π2,π3,π4 for achieving these goals Ywould not be identical - the best policies for making paperclips and diamonds are not exactly the same. But all of these policies (we think) would lie within the partition X⊂Π where the superintelligence tries to "transport matter and energy efficiently", for example (perhaps by using superconducting cables,cables), rather than the complementary partition ¬X where the superintelligence does not try to transport matter and energy efficiently.
If, given our beliefs P about our universe,universe and which policies lead to which real outcomes, we think that in an intuitive sense it sure looks like at least 90% of the utility functions Uk∈UK ought to imply best findable policies...