I know Wei Dai has criticized CEV as a construct, I believe offering the alternative of rigorously specifying volition *before* making an AI. I couldn't find these posts/comments via a search, can anyone link me? Thanks.

There may be related top-level posts, but there is a good chance that what I am specifically thinking of was a comment-level conversation between Wei Dai and Vladimir Nesov.

Also feel free to use this thread to criticize CEV and to talk about other possible systems of volition.

New Comment
29 comments, sorted by Click to highlight new comments since: Today at 11:52 AM

Probably a good idea to unpack non-ubiquitous abbreviations at least once per post, maybe even provide links, like to this page about Coherent Extrapolated Volition as a method for choosing an AI's morals.

But yeah, sorry, 2 minutes of google-fu didn't find it, and I don't particularly care to invest more, though I probably found enough of wei dai's isolated thoughts to approximate his criticisms. Good luck!


I think the post is this: Hacking the CEV for Fun and Profit.

Ben's cut-down version;

Roko's related ideas;

I've criticised CEV from several directions. It seems technically and politically infeasible to me. The CEV documents mostly read like wishlists for those seeking funding from some kind of benevolent communist government, whose leader is into science fiction.

However, I haven't collected my thoughts on the topic together in one place. I figure I should probably save my energy in this area, until there is something less out-of-date, and more concrete to consider.

The link to 'Roko's related ideas' goes to a document which has the same content as Nick Tarleton's (CEV paper)[http://intelligence.org/files/CEV-MachineEthics.pdf], with a different title and no author attribution.


I think the post is this: Hacking the CEV for Fun and Profit.

Here's the post where I suggested considering the possibility of specifying volition "manually": http://lesswrong.com/lw/1oj/complexity_of_value_complexity_of_outcome/

See also Marcello's criticism of CEV (where I also give additional comments).

I have always been rather nervous about the concept of CEV. Particularly frightening to me are the modifiers "coherent" and "extrapolated". The explanations of these terms in this document strike me as quite incoherent and hence I am forced to extrapolate to get any meaning at all from the phrase. (The fact that the document is more than six years old and proclaims it own obsolescence in its first paragraph does not instill confidence). Of course, this posting gives me further cause for concern. It seems I may also be confused about "volition".

First, let me say why "coherent" frightens me. I wish the word were "collective" instead. It is my understanding that the point of specifying that the volition be "coherent" is that we wish to filter out the incoherent bits of mankind's volition. For example, if mankind's volition were that we not build a monolithic, super-powerful AI in the first place, then that would be an incoherent wish which should be ignored. Or, if mankind's volition did not think that the conquest of death was a high priority, that too would be incoherent and ought to be ignored. The incoherent 'philosophy' of folks below the waterline cannot be allowed to trump the volition of the more rational folk above.

The above is a caricature of 'coherence' as presented in the May 2004 document. If someone else can provide a better interpretation, that would be welcome.

Next, let me say why "extrapolated" frightens me. Extrapolation ought to frighten everyone. An AI has no business looking farther into the future than its human creators. An AI has no need to extrapolate. It has no need to look far ahead into the future. Mankind and its volition are traveling into the future along with the AI. If the AI needs to know what mankind wants 1000 years from now, it should just wait for 1000 years and then ask. It will receive a much better and well informed answer than can be achieved by extrapolating.

Once again, I may be objecting here to a caricature, straw-man interpretation of 'extrapolated'. I wish the word were 'expressed'. Can anyone provide me with a better explanation than Eliezer's (2004) as to why the word 'extrapolated' is the appropriate one?

Also, does anyone have a link to the Wei Dai comments regarding "volition"? The OP's hints make this word just as mysterious and frightening to me as the other two.

The above is a caricature of 'coherence' as presented in the May 2004 document. If someone else can provide a better interpretation, that would be welcome.

That doesn't sound like how I interpreted 'coherent'. I assumed it meant a volition the vast majority of humanity agrees with / a measure of how much humanity's volition agrees. If humanity really didn't care about death, then that would be a coherent volition. So something like 'collective' indeed.

As for extrapolation, it's not intended to literally look into the future. I thought the example of the diamond in the box was fairly enlightening. The human says 'I want box 1', thinking box 1 contains a diamond. The AI knows the diamond is in box 2, and can extrapolate (as humans do) that the human actually wants the diamond and would ask for box 2 if they knew where the diamond was. The smart AI therefore opens box 2, and the human is happy because they have a diamond. A dumb AI would just give the human box 1 "because they asked for it", even if that's what they didn't really want.

When a lot of humans then say "the conquest of death is not a high priority" the AI extrapolates that if we knew more or had basic rationality training we would say conquest of death is a high priority. And therefore goes about solving death.

At least that's how I understood it.

That is pretty much how I understood it too. It scares me. I would strongly prefer that it ask "Why not conquer death? I don't understand." Rather than just going ahead ignoring my stated preference. I dislike that it would substitute its judgment for mine simply because it believes it is wiser. You don't discover the volition of mankind by ignoring what mankind tells you.

It doesn't seem that scary to me. I don't see it as substituting "its own judgement" for ours. It doesn't have a judgement of its own. Rather, it believes (trivially correctly) that if we were wiser, we would be wiser than we are now. And if it can reliably figure out what a wiser version of us would say, it substitutes that person's judgement for ours.

I suppose I imagine that if told I shouldn't try to solve death, I would direct the person to LessWrong, try to explain to them the techniques of rationality, refer them to a rationalist dojo, etc. until they're a good enough rationalist they can avoid reproducing memes they don't really believe in -- then ask them again.

The AI with massively greater resources can of course simulate all this instead, saving a lot of time. And the benefit of the AI's method is that when the "simulation" says "I wish the AI had started preventing death right away instead of waiting for me to become a rationalist", The AI can grant this wish!

The AI doesn't inherently know what's good or bad. It doesn't even know what it should be surprised by (only transhumanists seem to realise that "let's not prevent death" shouldn't make sense). It can only find out by asking us, and of course the right answer is more likely to be given by a "wise" person. So the best way for the AI to find out what is right or wrong is to make everyone as wise as possible, then ask them (or predict what would happen if it did).

"What would I do if I were wiser?" may not be a meaningful question. Your current idea of wisdom is shaped by your current limitations.

At least the usual idea of wisdom is that it's acquired through experience, and how can you know how more experience will affect you? Even your idea of wisdom formed by observing people who seem wiser than yourself is necessarily incomplete. All you can see is effects of a process you haven't incorporated into yourself.

It [FAI] doesn't have a judgement of its own.


And if it [FAI] can reliably figure out what a wiser version of us would say, it substitutes that person's judgement for ours.


I would direct the person to LessWrong, [...] until they're a good enough rationalist [...] -- then ask them again.

It seems you have a flaw in your reasoning. You will direct a person to LessWrong, someone else will direct a person to church. And FAI should figure out somehow which direction a person should take to be wiser, without a judgment of its own.

That's true.

According to the 2004 paper, Eliezer thinks (or thought, anyway) "what we would decide if we knew more, thought faster, were more the people we wished we were, had grown up farther together..." would do the trick. Presumably that's the part to be hard-coded in. Or you could extrapolate (using the above) what people would say "wisdom" amounts to and use that instead.

Actually, I can't imagine someone who knew and understood both the methods of rationality (having been directed to LessWrong) and all the teachings of the church (having been directed to church) would then direct a person to church. Maybe the FAI can let a person take both directions to become wiser.

ETA: Of course, in FAI 'maybe' isn't good enough...

I mentioned this problem already. And I (07/2010) thought about ways to ensure that FAI will prefer my/our/rational way of extrapolating.

Now I think it would be better if FAI will select coherent subset of volitions of all reflectively consistent extrapolations. As I suspect it will be something like: protect humanity from existential risk, but don't touch it beyond that.

Yes. The problem is, if you look at the biggest disagreements humans have had - slavery, abortion, regional independence, whom to tax, how much the state should help people who can't help themselves, how much clothing women should wear, whether women should work outside the home - none of them can be resolved in this method. Religion, possibly; but only to the extent that a religion's followers care about the end goal of getting into heaven, and not to the extent that they have internalized its values.

The above is a caricature of 'coherence' as presented in the May 2004 document. If someone else can provide a better interpretation, that would be welcome.

It seemed accurate to me. Also, I didn't find any problems from it that would seem frightening or so. Was it supposed to be problematic in some way?

Was it supposed to be problematic in some way?

You mean other than being politically naive and likely to get a lot of people killed? You are asking what I have against it personally, if it should somehow come to pass?

Well, to be honest, I'm not sure. I usually try to base my important opinions on some kind of facts, on 'official' explanations. But we don't have those here. So I am guessing. But I do strongly suspect that my fundamental values are very different than those of the author of CEV. Because I am not laboring under the delusion that everyone else is just like me, ... only stupider. I know that human values are diverse, and that any kind of collective set of values must be negotiated, rather than somehow 'extrapolated'.

I don't think it has much chance of being implemented - so, I figure, there is not much reason to worry about it.

Thank you for your signal, I guess.

So you're bound to end up losing in this game, anyway, right? Negotiation in itself won't bring you any additional power over the coherent extrapolated volition of humanity to change the future of the universe. If others think very much unlike you, you need to overpower them to bring your values back to the game or perish in the attempt.

I don't understand your thinking here at all. Divergent values are not a barrier to negotiation. They are the raw material of negotiation. The barrier to negotiation is communication difficulty and misunderstanding.

Why do you think I lose?

Why do you think I lose?

Because there are a lot more of those with values totally different from yours, which made the CEV optimize a future that you didn't like at all. If you're negotiating will all those people, why would they give in to you any more than CEV would optimize for you?

Hmmm. That is not the scenario I was talking about. I was imagining that there would be a large number of people who would feel disenfranchised because their values were considered incoherent (or they were worried that their values might be thought incoherent). This coalition would seize political control of the CEV creation bureaucracy, change "coherent extrapolated" to "collective expressed" and then begin the negotiation process.

And? If you have multiple contradictory wishes what to do next, some of them are bound to be unfulfilled. CEV or negotiation are just ways to decide which ones.

Yes, and until someone explains how CEV works, I will prefer negotiation. I understand it, I think it generates the best, fairest results, etc. With AI assistance, some of the communication barriers can be lowered and negotiation will become an even better tool. CEV, on the other hand, is a complete mystery to me.


I think the post is this: Hacking the CEV for Fun and Profit.


I think the link is this: Hacking the CEV for Fun and Profit.