Consequentialist reasoning selects policies on the basis of their predicted consequences - it does action X because Xis reasoningforecasted to lead to preferred outcome Y. Whenever we reason that an agent which startsprefers outcome Y over Y′ will therefore do X instead of X′, we're implicitly assuming that the agent has the cognitive ability to do consequentialism at least about Xs and Ys. It does means-end reasoning; it selects means on the basis of their predicted ends plus a preference over ends.
E.g: When we infer that a paperclip maximizer would try to improve its own cognitive abilities given means to do so, the background assumptions include:
(Technically, since the forecasts of our actions' consequences will usually be uncertain, a coherent agent needs a utility function over outcomes and not just a preference ordering over outcomes.)
The related idea of "backward chaining" is one particular way of solving the cognitive problems of consequentialism: start from a desired outcome/event/future, and selectsfigure out what actions, strategies, and intermediate events are likely to have the consequence of bringing about that outcome.event/outcome, and repeat this question until it arrives back at a particular plan/policy/action.
Many narrow AI algorithms are consequentialists over narrow domains. A chess program that searches far ahead in the game tree is a consequentialist; it outputs chess moves based on the expected result of those chess moves and your replies to them, into the distant future of the board.
We can see one of the critical aspectaspects of human intelligence as cross-domain consequentialism;consequentialism. Rather than only forecasting consequences within the boundaries of a narrow domain, we can trace chains of events that leap from one domain to another. Making a chess move wins a chess game that wins a chess tournament that wins prize money that can be used to rent a car that can drive to the supermarket to get milk. An Artificial General Intelligence that could learn many domains, and engage in consequentialist reasoning that leaped across those domains, would be a sufficiently advanced agent to be interesting from most perspectives on interestingness. It would start to be a consequentialist about the real world.
Some systems are pseudoconsequentialist - they in some ways behave as if outputting actions on the basis of their leading to particular futures, without using an explicit cognitive model and explicit forecasts.
For example, natural selection has...
weWe can see one critical aspect of human intelligence as cross-domain consequentialism; natural selection is the only other optimization process that behaves something like a cross-domain consequentialist.
Note the distinction between consequentialist reasoning and expected utility maximization, which is a special case that isn't necessary to get us those properties that are consequences just of consequentialism
consequentialismConsequentialism yields the instrumentally convergent strategies, and produces Nearest Unblocked Neighbor behavior on attempts to block particular strategies.
Note the distinction between consequentialist reasoning and expected utility maximization,maximization, which is a special case that isn't necessary to get usproduce those properties that are consequences just of consequentialismconsequentialism.
Since 'consquentialism' or 'linking up actions to consequences' or 'figuring out how to get to a consequence' is so close to what would make advanced AIs useful in the first place, it shouldn't be surprising if some attempts to subvert consequentialism in the name of safety run squarely into an unresolvable safety-usefulness tradeoff.
Another concern is that consequentialism may to some extent be a convergent or default outcome of optimizing anything hard enough. E.g., although natural selection is a pseudoconsequentialist process, it optimized for reproductive capacity so hard that it eventually spit out some very effectively reproducingpowerful organisms that were actualexplicit cognitive consequentialists (aka humans).
We might similarly worry that optimizing any internal aspect of a powerful machine intelligence hard enough would start to embed internal consequentialism somewhere - policiespolicies/designs/answers selected from a sufficiently general space that "be a consequentialist""do consequentialist reasoning" is one kindembedded in some of policy; enormous recurrent neural networks pushed down a slope of gradient descent far enough that they start to embed consequentialist reasoning; etcetera.the most effective answers.
We might also worry thatOr perhaps a machine intelligence might need to be consequentialist in some internal aspects in order to be smart enough to do sufficiently useful things - maybe you just can't get a sufficiently advanced machine intelligence, sufficiently early, unless it is, e.g., choosing on a consequential basis what thoughtthoughts to think next,about, or engaging in consequentialist engineering of someits internal programs.elements.
In the same way that expected utility is the only qualitatively coherent way of making certain choices, or in the same way that natural selection optimizing hard enough on reproduction started spitting out independent consequentialist agents,explicit cognitive consequentialists, we might worry that consequentialism is in some sense central enough or convergent enough that it will be hard to subvert - hard enough that we can't easily get rid of instrumental convergence on problematic strategies just by getting rid of the consequentialism while preserving the AI's usefulness.
Consequentialism is an extremely basic idiom of optimization:
Anything that Aristotle would have considered as having a "final cause", or teleological explanation, without being entirely wrong about that, is something we can see through the lens of cognitive consequentialism or pseudoconsequentialism. A plan, a design, a reinforced behavior, or selected genes: Most of the complex order on Earth derives from one or more of these.
Consequentialism,Consequentialism or sometimes pseudoconsequentialism, over various domains, is an advanced agent property that is a key requisite or key threshold in several issues of AI alignment and advanced safety:
Consequentialist reasoning is reasoning which starts from a desired outcome/event/future, and selects what actions, strategies, and intermediate events are likely to have the consequence of bringing about that outcome.
we can see one critical aspect of human intelligence as cross-domain consequentialism; natural selection is the only other optimization process that behaves something like a cross-domain consequentialist.
distinction between consequentialist reasoning and expected utility maximization, which is a special case that isn't necessary to get us those properties that are consequences just of consequentialism
consequentialism yields the instrumentally convergent strategies, and produces Nearest Unblocked Neighbor behavior on attempts to block particular strategies.