Consequentialist reasoning selects policies on the basis of their predicted consequences - it does action X because Xis ~~reasoning~~forecasted to lead to preferred outcome Y. Whenever we reason that an agent which ~~starts~~prefers outcome Y over Y′ will therefore do X instead of X′, we're implicitly assuming that the agent has the cognitive ability to do consequentialism at least about Xs and Ys. It does means-end reasoning; it selects means on the basis of their predicted ends plus a preference over ends.

E.g: When we infer that a paperclip maximizer would try to improve its own cognitive abilities given means to do so, the background assumptions include:

That the paperclip maximizer can forecast the consequences of the policies "self-improve" and "don't try to self-improve";
That the forecasted consequences are respectively "more paperclips eventually" and "less paperclips eventually";
That the paperclip maximizer preference-orders outcomes on the basis of how many paperclips they contain;
That the paperclip maximizer outputs the immediate action it predicts will lead to more future paperclips.

(Technically, since the forecasts of our actions' consequences will usually be uncertain, a coherent agent needs a utility function over outcomes and not just a preference ordering over outcomes.)

The related idea of "backward chaining" is one particular way of solving the cognitive problems of consequentialism: start from a desired outcome/event/future, and ~~selects~~figure out what ~~actions, strategies, and~~ intermediate events are likely to have the consequence of bringing about that ~~outcome.~~event/outcome, and repeat this question until it arrives back at a particular plan/policy/action.

Many narrow AI algorithms are consequentialists over narrow domains. A chess program that searches far ahead in the game tree is a consequentialist; it outputs chess moves based on the expected result of those chess moves and your replies to them, into the distant future of the board.

We can see one of the critical ~~aspect~~aspects of human intelligence as cross-domain ~~consequentialism;~~consequentialism. Rather than only forecasting consequences within the boundaries of a narrow domain, we can trace chains of events that leap from one domain to another. Making a chess move wins a chess game that wins a chess tournament that wins prize money that can be used to rent a car that can drive to the supermarket to get milk. An Artificial General Intelligence that could learn many domains, and engage in consequentialist reasoning that leaped across those domains, would be a sufficiently advanced agent to be interesting from most perspectives on interestingness. It would start to be a consequentialist about the real world.

Pseudoconsequentialism

Some systems are pseudoconsequentialist - they in some ways behave as if outputting actions on the basis of their leading to particular futures, without using an explicit cognitive model and explicit forecasts.

For example, natural selection has...

Read More (882 more words)

weWe can see one critical aspect of human intelligence as cross-domain consequentialism; natural selection is the only other optimization process that behaves something like a cross-domain consequentialist.

Note the distinction between consequentialist reasoning and expected utility maximization, which is a special case that isn't necessary to get us those properties that are consequences just of consequentialism

~~consequentialism~~Consequentialism yields the instrumentally convergent strategies, and produces Nearest Unblocked Neighbor behavior on attempts to block particular strategies.

Since 'consquentialism' or 'linking up actions to consequences' or 'figuring out how to get to a consequence' is so close to what would make advanced AIs useful in the first place, it shouldn't be surprising if some attempts to subvert consequentialism in the name of safety run squarely into an unresolvable safety-usefulness tradeoff.

Another concern is that consequentialism may to some extent be a convergent or default outcome of optimizing anything hard enough. E.g., although natural selection is a pseudoconsequentialist process, it optimized for reproductive capacity so hard that it eventually spit out some ~~very effectively reproducing~~powerful organisms that were ~~actual~~explicit cognitive consequentialists (aka humans).

We might similarly worry that optimizing any internal aspect of a ~~powerful~~ machine intelligence hard enough would start to embed ~~internal~~ consequentialism somewhere - ~~policies~~policies/designs/answers selected from a sufficiently general space that ~~"be a consequentialist"~~"do consequentialist reasoning" is ~~one kind~~embedded in some of ~~policy; enormous recurrent neural networks pushed down a slope of gradient descent far enough that they start to embed consequentialist reasoning; etcetera.~~the most effective answers.

~~We might also worry that~~Or perhaps a machine intelligence might need to be consequentialist in some internal aspects in order to be smart enough to do sufficiently useful things - maybe you just can't get a sufficiently advanced machine intelligence, sufficiently early, unless it is, e.g., choosing on a consequential basis what ~~thought~~thoughts to think ~~next,~~about, or engaging in consequentialist engineering of ~~some~~its internal ~~programs.~~elements.

In the same way that expected utility is the only ~~qualitatively~~ coherent way of making certain choices, or in the same way that natural selection optimizing hard enough on reproduction started spitting out ~~independent consequentialist agents,~~explicit cognitive consequentialists, we might worry that consequentialism is in some sense central ~~enough or convergent~~ enough that it will be hard to subvert - hard enough that we can't easily get rid of instrumental convergence on problematic strategies just by getting rid of the consequentialism while preserving the AI's usefulness.

Ubiquity of consequentialism

Consequentialism is an extremely basic idiom of optimization:

You don't go to the airport because you really like airports; you go to the airport so that, in the future, you'll be in Oxford.
An air conditioner is an artifact selected from possibility space such that the future consequence of running the air conditioner will be cold air.
A butterfly, by virtue of its DNA having been repeatedly selected to have previously brought about the past consequence of replication, will, under stable environmental conditions, bring about the future consequence of replication.
A rat that has previously learned a maze, is executing a policy that previously had the consequence of reaching the reward pellets at the end: A series of turns or behavioral rule that was neurally reinforced in virtue of the future conditions to which it led the last time it was executed. This policy will, given a stable maze, have the same consequence next time.
Faced with a superior chessplayer, we enter a state of Vingean uncertainty in which we are more sure about the final consequence of the chessplayer's moves - that it wins the game - than we have any surety about the particular moves made. To put it another way, the main abstract fact we know about the chessplayer's next move is that the consequence of the move will be winning.
As a chessplayer becomes strongly superhuman, its play becomes instrumentally efficient in the sense that no abstract description of the moves takes precedence over the consequence of the move. A weak computer chessplayer might be described in terms like "It likes to move its pawn" or "it tries to grab control of the center", but as the chess play improves past the human level, we can no longer detect any divergence from "it makes the moves that will win the game later" that we can describe in terms like "it tries to control the center (whether or not that's really the winning move)". In other words, as a chessplayer becomes more powerful, we stop being able to describe its moves that will ever take priority over our beliefs that the moves have a certain consequence.

Anything that Aristotle would have considered as having a "final cause", or teleological explanation, without being entirely wrong about that, is something we can see through the lens of cognitive consequentialism or pseudoconsequentialism. A plan, a design, a reinforced behavior, or selected genes: Most of the complex order on Earth derives from one or more of these.

~~Consequentialism,~~Consequentialism or ~~sometimes~~ pseudoconsequentialism, over various domains, is an advanced agent property that is a key requisite or key threshold in several issues of AI alignment and advanced safety:

You get unforeseen maxima because

...

			v1.9.0Jun 11th 2016 GMT	(+202/-367)
			v1.8.0Jun 11th 2016 GMT
			v1.7.0Jun 11th 2016 GMT	(+2595/-94)
			v1.6.0Jun 10th 2016 GMT	(+8829/-506)
			v1.5.0Dec 17th 2015 GMT
			v1.4.0Dec 5th 2015 GMT	(+37/-35)
			v1.3.0Dec 5th 2015 GMT	(-4)
			v1.2.0Oct 14th 2015 GMT	(+27/-18)
			v1.1.0Jul 2nd 2015 GMT	(+566)
			v1.0.0Jul 2nd 2015 GMT	(+218)

			v1.9.0Jun 11th 2016 GMT	(+202/-367)
			v1.8.0Jun 11th 2016 GMT
			v1.7.0Jun 11th 2016 GMT	(+2595/-94)
			v1.6.0Jun 10th 2016 GMT	(+8829/-506)
			v1.5.0Dec 17th 2015 GMT
			v1.4.0Dec 5th 2015 GMT	(+37/-35)
			v1.3.0Dec 5th 2015 GMT	(-4)
			v1.2.0Oct 14th 2015 GMT	(+27/-18)
			v1.1.0Jul 2nd 2015 GMT	(+566)
			v1.0.0Jul 2nd 2015 GMT	(+218)

LESSWRONG
LW

LESSWRONG
LW

Consequentialist cognition

Consequentialist cognition

Pseudoconsequentialism

Ubiquity of consequentialism