# 12

Note: I appreciate that at this point CEV is just a sketch. However, it’s an interesting topic and I don’t see that there’s any harm in discussing certain details of the concept as it stands.

1. Summary of CEV

Eliezer Yudkowsky describes CEV - Coherent Extrapolated Volition – here. Superintelligent AI is a powerful genie, and genies can’t be trusted; Friendly AI requires the AI to take as input the entire value computation of at least one human brain, because the failure to take into consideration a relatively small element of the human value set, even whilst optimising in several other respects, is likely to be a disaster. CEV is Yudkowsky’s attempt at outlining a Friendly AI volition-extrapolating dynamic: a process in which the AI takes human brainstates, combines this with its own vast knowledge, and outputs suitable actions to benefit humans.

Note that extrapolating volition is not some esoteric invention of Eliezer’s; it is a normal human behaviour. To use his example: we are extrapolating Fred’s volition (albeit with short distance) if given two boxes A and B only one of which contains a diamond that Fred desires, we give him box B when he has asked us to give him box A, on the basis that he incorrectly believes that box A contains the diamond whereas we know that in fact it is in box B.

Yudkowsky roughly defines certain quantities that are likely to be relevant to the functioning of the CEV dynamic:

Spread describes the case in which the extrapolated volition is unpredictable. Quantum randomness or other computational problems may make it difficult to say with strong confidence (for example) whether person A would like to be given object X tomorrow – if the probability computed is 30%, rather than 0.001%, there is significant spread in this case.

Muddle is a measure of inconsistency. For example person A might resent being given object Y tomorrow, but also resent not being given object Y if it isn’t given to him tomorrow.

Distance measures the degree of separation between one’s current self and the extrapolated self, i.e. how easy it would be to explain a given instance of extrapolated volition to someone. In the case of Fred and the diamond the distance is very short, but superintelligent AI could potentially compute Fred’s extrapolated volition to such a distance that it seems incomprehensible to Fred.

To quote Yudkowsky (I assume that the following remains approximately true today):

As of May 2004, my take on Friendliness is that the initial dynamic should implement the coherent extrapolated volition of humankind.

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

Yudkowsky adds that “it should be easier to counter coherence than to create coherence” where coherence refers to strong, un-muddled and un-spread agreement between multiple individual volitions with no strong disagreement from any others; and that “the initial dynamic for CEV should be conservative about saying ‘yes’ and listen carefully for ‘no’” – the superintelligent optimisation process should seek more consensus before steering humanity into narrow slices of the future, relative to the degree of consensus it needs before steering humanity away from some particular narrow slice of the future (about which it has been warned by elements of the CEV).

CEV is an initial dynamic; it doesn’t necessarily have to be the perfect dynamic of human volition for Friendly AI, but the dynamic should be good enough that it allows the AI to extrapolate an optimal dynamic of volition to which we can then switch over if desirous. “The purpose of CEV as an initial dynamic is not to be the solution, but to ask what solution we want”.

Also, “If our extrapolated volitions say we don’t want our extrapolated volitions manifested, the system replaces itself with something else we want, or else...undergoes an orderly shutdown”.

Finally, Yudkowsky suggests that as a safeguard, a last judge of impeccable judgement could be trusted with putting the seal of approval on the output of the CEV dynamic; if something seems to have gone horribly wrong, beyond mere future shock, he can stop the output from being enacted.

2. CEV of all humankind vs. CEV of a subset of humankind

Let us accept that coherent extrapolated volition, in general, is the best (only?) solution that anyone has provided to the problem of AI friendliness. I can see four ways of implementing a CEV initial dynamic:

• Implement a single CEV dynamic incorporating all humans, the output of which affects everyone.
• Implement an individual CEV dynamic for each individual human.
• Implement a single CEV dynamic incorporating one human only, the output of which affects everyone.
• Implement a single CEV dynamic incorporating a limited subset of humans, the output of which affects everyone.

As Yudkowsky discusses in his document, whilst the second option might perhaps be a reasonable final dynamic (who knows?) it isn’t a suitable initial dynamic. This is because if there is more than one CEV running, the way in which the CEV dynamic works in a general sense cannot be re-written without someone’s individual CEV being violated, and the idea behind the initial dynamic is that a superior dynamic may develop from it.

The third option is obviously sub-optimal, because of the danger that any individual person might be a psychopath – a person whose values are in general markedly hostile to other humans. Knowing more and thinking smarter might lead a given psychopath’s more humane values to win out, but we can’t count on that. In a larger group of people, the law of large numbers applies and the risk diminishes.

Yudkowsky favours the first option, a CEV of all humankind; I am more in favour of the fourth option, an initial CEV dynamic incorporating the minds of only a certain subset of humans. It would like to compare these two options on six relevant criteria:

I Schelling points [edit: apologies for the questionable use of a game theory term for the sake of concision]

Clearly, incorporating the minds of all humankind into the intial dynamic is a Schelling point – a solution that people would naturally generate for themselves in the absence of any communication. So full marks to a universal CEV on this criterion.

Answer quickly: what specific group of people – be that a group of people who meet each other regularly, or a group who are distinguished in some other way – would you nominate, if you had to choose a certain subset of minds to participate in the initial dynamic?

What springs to my mind is Nobel Prize winners, and I suspect that this too is a Schelling point. This seems like a politically neutral selection of distinguished human beings (particularly if we exclude the Peace Prize) of superlative character and intellect. Whether some people would object strongly to this selection is one question, but certainly I expect that many humans, supposing they were persuaded for other reasons that the most promising initial dynamic is one incorporating a small group of worthy humans only, would consider Nobel Prize winners to be an excellent choice to rally around.

Many other groups of minds, for example the FAI programming team themselves, would of course seem too arbitrary to gather sufficient support for the idea.

II Practicality of implementation

One problem with a universal CEV that I have never seen discussed is how feasible it would actually be to take extremely detailed recordings of the brain states of all of the humans on Earth. All of the challenges involved in creating Friendly AI are of course extreme. But ceteris paribus, one additional extremely challenging problem is one too many.

A prerequisite for the creation of superintelligent AI must surely be the acquisition of detailed knowledge of the workings of the human brain. However, our having the ability to scan one human brain in extreme detail does not imply that it is economically feasible to scan 7 billion or more human brains in the same way. It might well come to pass that the work on FAI is complete, but we still lack the means to actually collect detailed knowledge of all existing human minds. A superintelligent AI would develop its own satisfactory means of gathering information about human brains with minimal disruption, but as I understand the problem we need to input all human minds into the AI before switching it on and using it to do anything for us.

Even if the economic means do exist, consider the social, political and ideological obstacles. How do we deal with people who don’t wish to comply with the procedure?

Furthermore, let us suppose that we manage to incorporate all or almost all human minds into the CEV dynamic. Yudkowsky admits the possibility that the thing might just shut itself down when we run it – and he suggests that we shouldn’t alter the dynamic too many times in an attempt to get it to produce a reasonable-looking output, for fear of prejudicing the dynamic in favour of the programmers’ preferences and away from humanity’s CEV.

It would be one thing if this merely represented the (impeccably well-intentioned) waste of a vast amount of money, and the time of some Nobel Prize winners. But if it also meant that the economic, political and social order of the entire world had been trampled over in the process of incorporating all humans into the CEV, the consequences could be far worse. Enthusiasm for a second round with a new framework at some point in the future might be rather lower in the second scenario than in the first.

III Safety

In his document on CEV, Yudkowsky states that there is “a real possibility” that (in a universal CEV scenario) the majority of the planetary population might not fall into a niceness attractor when their volition is extrapolated.

The small group size of living scientific Nobel Prize winners (or any other likely subset of humans) poses certain problems for a selective CEV that the universal CEV lacks. For example, they might all come under the influence of a single person or ideology that is not conducive to the needs of wider humanity.

On the other hand, given their high level of civilisation and the quality of character necessary for a person to dedicate his life to science, ceteris paribus I’d be more confident of Nobel Prize winners falling into a niceness attractor in comparison to a universal CEV. How much trust are we willing to place in the basic decency of humankind – to what extent is civilisation necessary to create a human who would not be essentially willing to torture innocent beings for his own gratification? Perhaps by the time humanity is technologically advanced enough to implement AGI we’ll know more about that, but at our current state of knowledge I see little reason to give humans in general the benefit of the doubt.

Yudkowsky asks, “Wouldn’t you be terribly ashamed to go down in history as having meddled...because you didn’t trust your fellows?” Personally, I think that shutting up and multiplying requires us to make our best estimate of what is likely to benefit humankind (including future humans) the most, and run with that. I’d not be ashamed if in hindsight my estimate was wrong, since no-one can be blamed for having imperfect knowledge.

IV Aesthetic standards

In his document, Yudkowsky discusses the likelihood of certain volitions cancelling one another out whilst others add together; metaphorically speaking, “love obeys Bose-Einstein statistics while hatred obeys Fermi-Dirac statistics”. This supports the idea that extrapolating volition is likely to produce at least some useful output – i.e. having minimal spread and muddle, ideally at not too far a distance.

In a universal CEV this leads us to believe that Pakistani-Indian mutual hatred, for example, cancels out (particularly since coherence is easier to counter than to create) whereas their mutual preferences form a strong signal.

The problem of aesthetic standards concerns the quality of the signal that might cohere within the CEV. Love seems to be a strong human universal, and so we would expect love to play a strong role in the output of the initial dynamic. On the other hand, consider the difference in intelligence and civilisation between the bulk of humanity and a select group such as Nobel Prize winners. Certain values shared by such a select group, for example the ability to take joy in the merely real, might be lost amidst the noise of the relatively primitive values common to humanity as a whole.

Admittedly, we can expect “knowing more” and “growing up farther together” to improve the quality of human values in general. Once an IQ-80 tribesman gains more knowledge and thinks faster, and is exposed to rational memes, he might well end up in exactly the same place as the Nobel Prize winners. But the question is whether it’s a good idea to rely on a superb implementation of these specifications in an initial dynamic, rather than taking out the insurance policy of starting with substantially refined values in the first place – bearing in mind what is at stake.

A worst case scenario, assuming that other aspects of the FAI implementation work as planned, is that the CEV recommends an ignoble future for humanity – for example orgasmium – which is not evil, but is severely lacking in aesthetic qualities that might have come out of a more selective CEV. Of course, the programmers or the last judge should be able to veto an undesirable output. But if (as Yudkowsky recommends) they only trust themselves to tweak the dynamic a maximum of three times in an effort to improve the output before shutting it off for good if the results are still deemed unsatisfactory, this does not eliminate the problem.

V Obtaining a signal

It seems to me that the more muddle and spread there is within the CEV, the greater the challenge that exists in designing an initial dynamic that outputs anything whatsoever. Using a select group of humans would ensure that these quantities are minimised as far as possible. This is simply because they are likely to be (or can be chosen to be) a relatively homogeneous group of people, who have relatively few directly conflicting goals and possess relatively similar memes.

Again, why make the challenge of FAI even more difficult than it needs to be? Bear in mind that failure to implement Friendly AI increases the likelihood of uFAI being created at some point.

VI Fairness

In his document on CEV, Yudkowsky does go some way to addressing the objections that I have raised. However, I do not find him persuasive on this subject:

Suppose that our coherent extrapolated volition does decide to weight volitions by wisdom and kindness – a suggestion I strongly dislike, for it smacks of disenfranchisement. It don’t think it wise to tell the initial dynamic to look to whichever humans judge themselves as wiser and kinder. And if the programmers define their own criteria of “wisdom” and “kindness” into a dynamic’s search for leaders, that is taking over the world by proxy. You wouldn’t want the al-Qaeda programmers doing that, right?

Firstly, the question of disenfranchisement. As I suggested earlier, this constitutes a refusal to shut up and multiply when dealing with a moral question. “Disenfranchisement” is a drop in the ocean of human joy and human suffering that is at stake when we discuss FAI. As such, it is almost completely irrelevant as an item of importance in itself (of course there are other consequences involved in the choice between universal CEV and a degree of disenfranchisement – but they have been discussed already, and are beside the point of the strictly moral question.) This is especially the case since we are only talking about the initial dynamic here, which may well ultimately develop into a universal CEV.

Secondly, there is the mention of al-Qaeda. In the context of earlier mentions of al-Qaeda programmers in the document on CEV, Yudkowsky appears to be positing a “veil of ignorance” – we should behave in creating the FAI as we would want al-Qaeda programmers to behave. This is strange, because in a similar veil of ignorance problem – the modesty argument – Robin Hanson argued that we should act as though there is a veil of ignorance surrounding whether it is ourselves or someone else who is wrong in some question of fact, whereas Eliezer argued against the idea.

Personally I have little regard for veil of ignorance arguments, on the basis that there is no such thing as a veil of ignorance. No, I would not want the al-Qaeda programmers to nominate a group of humans (presumably Islamic fanatics) and extrapolate their volition – I would rather they used all of humanity. But so what? I am quite happy using my own powers of judgement to decide that al-Qaeda’s group is inferior to humanity as a whole, but Nobel Prize winners (for example) are a better choice than humanity as a whole.

As for “taking over the world by proxy”, again SUAM applies.

3. Conclusion

I argue that a selective CEV incorporating a fairly small number of distinguished human beings may be preferable to a CEV incorporating all of humanity. I argue that the practical difficulty of incorporating all humans into the CEV in the first place is unduly great, and that the programming challenge is also made more difficult by virtue of this choice. I consider any increase in the level of difficulty in the bringing into existence of FAI to be positively dangerous, on account of the fact that this increases the window of time available for unscrupulous programmers to create uFAI.

Setting aside the problem of getting the initial dynamic to work at all, I also consider it to be possible for the output of a selective CEV to be more desirable to the average human than the output of a universal CEV. The initial dynamic is the creation of human programmers, who are fallible in comparison to a superintelligent AI; their best attempt at creating a universal CEV dynamic may lead to the positive values of many humans being discarded, lost in the noise.

In other words, the CEV initial dynamic shouldn't be regarded as discovering what a group of people most desire collectively "by definition" - it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.

Furthermore, desirability of the CEV ouput to the average human in existence today should be weighed against the desires of (for example) sentient human uploads created in a post-singularity scenario. Shutting up and multiplying demands that FAI programmers and other people of influence set aside concerns about being “jerks” when estimating the probability that extrapolating the volition of humanity en masse is the best way of meeting their own moral standards.