As seen in other threads, people disagree on whether CEV exists, and if it does, what it might turn out to be.


It would be nice to try to categorize common speculations about CEV.

1a. CEV doesn't exist, because human preferences are too divergent

1b. CEV doesn't even exist for a single human 

1c. CEV does exist, but it results in a return to the status quo

2a. CEV results in humans living in a physical (not virtual reality) utopia

2b. CEV results in humans returning to a more primitive society free of technology

2c. CEV results in humans living together in a simulation world, where most humans do not have god-like power

(the similarity between 2a, 2b, and 2c is that humans are still living in the same world, similar to traditional utopia scenarios)

3. CEV results in a wish for the annihilation of all life, or maybe the universe

4a. CEV results in all humans granted the right to be the god of their own private simulation universe (once we acquire the resources to do so)

4b. CEV can be implemented for "each salient group of living things in proportion to that group's moral weight"

5. CEV results in all humans agreeing to be wireheaded (trope)

6a. CEV results in all humans agreeing to merge into a single being and discarding many of the core features of humankind which have lost their purpose (trope)

6b. CEV results in humans agree to cease their own existence but also creating a superior life form--the outcome is similar to 6a, but the difference is that here, humans do not care about whether they are individually "merged"

7. CEV results in all/some humans willingly forgetting/erasing their history, or being indifferent to preserving history so that it is lost (compatible with all previous tropes)

Obviously there are too many possible ideas (or "tropes") to list, but perhaps we could get a sense of which ones are the most common in the LW community.  I leave it to someone else to create a poll supposing they feel they have a close to complete list, or create similar topics for AI risk, etc.

EDIT: Added more tropes, changed #2 since it was too broad: now #2 refers to CEV worlds where humans live in the "same world"


New Comment
15 comments, sorted by Click to highlight new comments since: Today at 1:33 PM

Most of what I know about CEV comes from the 2004 Yudkowsky paper. Considering how many of his views have changed in similar timeframes, and how the paper states multiple times that CEV is a changing work in progress, this seems like a bad thing for my knowledge of the subject. Has there been any significant public changes since then, or are we still debating based on that paper?

0) CEV doesn't exist even for a single individual, because human preferences are too unstable and contingent on random factors for the extrapolation process to give a definite answer.

I may hunt through my old comments to find the ones where I was working this through a few years back, but I think I would summarize mine as:

  • A target's CEV does not uniquely exist: there's a large set of equally legitimate extrapolation processes that can be performed on a given target and their results don't converge, because human values and preferences change based on the events we experience. So many of the above options are non-mutually-exclusively true, but that's not a particularly interesting fact.

  • Even for a fixed extrapolation-method, CEV(A + B) bears no predictable relationship to CEV(A) and CEV(B), for the same reason: being part of group A is a different environment than being part of group (A+B) and results in different values and preferences. So the CEV of a group containing me varies greatly depending on the group.

  • That X is part of (one) CEV for A doesn't guarantee that A would endorse X given the choice; in fact, if A is human it's unlikely, since (again) human values aren't fixed. So implementing our extrapolated volition -- any of them -- will likely mean violating our current preferences.

For group decisions that require unanimity very little passes the process. Raising the bar to formal provability drops even more out of the equation. The CEV might very well be a very trivial common denominator such as "people should live good lifes".

There is a reasonable question about why it is that "For group decisions that require unanimity very little passes the process.". How much of this effect is honest difference in values, and how much is mere linguistic artifacts caused by our tiny communication bandwidth and how sloppily we use it.

IMO any CEV algorithm that had any hope of making sense would have to ignore words and map actual concepts together.

  • CEV results in a functional, human-like society very much like our own, including most or all of the features that non-typical people such as those speculating on CEV might consider horrifying.

This is pretty close to 1c

"But this time without the option of fixing it," I suppose I should add. You're right, it's close; the related nightmare would be a society where even the isolated communities of sane people similar to myself are missing.

(I'm aware that the 'average human' would probably have the same objection in reverse, considering that I'd like to eliminate their society. Well, yeah...)

Some new ideas which I have not seen previously.

  • Humans may wish for the creation of new sentient beings. But then a paradox arises, because the new beings should also be part of CEV. Then either CEV gives preference to humans existing before CEV was calculated, or CEV fails to converge as a consequence, or CEV finds a solution which can work for both humans and all potential newly created sentient beings.
  • CEV might work on humans, but then somebody creates a new artificial being with unnatural preferences specifically for the purpose of preventing CEV from converging

I understand CEV to mean extrapolating many different versions of each person, starting at different times in their life before the AGI gained the ability to re-write their preferences, and doing whatever the majority of sims vote for. I expect this to yield 2 for some kind of extrapolation. Defining it to exclude a weaker version of 3 that favors the AGI may prove tricky, but I believe such a definition exists and tend to believe we can find one.

{ETA: the AI doesn't technically want to kill all humans. If it can give our extrapolated selves values that are satisfied by a tautology, it should just sit there. The end result will depend on factors like whether or not the programmers realize what happened.}

(Also, I believe that if the definition is technically uncomputable or unwieldy or simply impossible to test for moral reasons, a well-designed AGI can still form reasonable beliefs about it.)

5 and 6 seem too vague to agree or disagree with. I've seen at least one person use the word "wireheading" to include not just 4 (which I tentatively disagree with, especially if "all" means all) but probably also many versions of 2. If it includes anything that fails to serve evolution's pseudo-goals in giving us our desires, then 5 seems almost trivially true.

No. 5, "Wireheading" is an established trope). So is No. 4, being addicted to dreams or simulations. There is a clear difference between the two, but they are similar in the fact that they evoke feelings of disgust due to the idea of a person abandoning or forgetting their "real-world" responsibilities out of selfishness.

EDIT: For future reference, hairyfigment's comment was made while trope 2 was "CEV is characterized by a complex set of preferences"

Do you know if there are literally entries for these outcomes on Should there be?

6 sounds an awful lot like a certain anime reference, and I suspect a non-trivial number of perceived examples of it are actually people making that reference with varying degrees of feigned seriousness.

Could you specify which anime reference?

EDIT: I can think of one, the Human Instrumentality Project from Evangelion

New to LessWrong?