LESSWRONG
LW

123
Wikitags

Coherent Extrapolated Volition

Edited by Ruby, pedrochaves, radical_negative_one, Yoav Ravid, et al. last updated 13th Feb 2025

Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to.

Related: Friendly AI, Metaethics Sequence, Complexity of Value

In calculating CEV, an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI's utility function. 

Often CEV is used generally to refer to what the idealized version of a person would want, separate from the context of building aligned AI's.

What is volition?

As an example of the classical concept of volition, the author develops a simple thought experiment: imagine you’re facing two boxes, A and B. One of these boxes, and only one, has a diamond in it – box B. You are now asked to make a guess, whether to choose box A or B, and you chose to open box A. It was your decision to take box A, but your volition was to choose box B, since you wanted the diamond in the first place.

Now imagine someone else – Fred – is faced with the same task and you want to help him in his decision by giving the box he chose, box A. Since you know where the diamond is, simply handing him the box isn’t helping. As such, you mentally extrapolate a volition for Fred, based on a version of him that knows where the diamond is, and imagine he actually wants box B.

Coherent Extrapolated Volition

"The "Coherent" in "Coherent Extrapolated Volition" does not indicate the idea that an extrapolated volition is necessarily coherent. The "Coherent" part indicates the idea that if you build an FAI and run it on an extrapolated human, the FAI should only act on the coherent parts. Where there are multiple attractors, the FAI should hold satisficing avenues open, not try to decide itself." - Eliezer Yudkowsky

In developing friendly AI, one acting for our best interests, we would have to take care that it would have implemented, from the beginning, a coherent extrapolated volition of humankind. In calculating CEV, an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI's utility function.

The main problems with CEV include, firstly, the great difficulty of implementing such a program - “If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004. He states that there's a "principled distinction between discussing CEV as an initial dynamic of Friendliness, and discussing CEV as a Nice Place to Live" and his essay was essentially conflating the two definitions.

Further Reading & References

  • Coherent Extrapolated Volition by Eliezer Yudkowsky (2004)
  • Coherent extrapolated volition (alignment target) by Eliezer Yudkowsky (2016)
  • Coherent Extrapolated Volition: A Meta-Level Approach to Machine Ethics by Nick Tarleton (2010)
  • A Short Introduction to Coherent Extrapolated Volition by Michael Anissimov
  • Hacking the CEV for Fun and Profit by Wei Dai
  • Two questions about CEV that worry me by Vladimir Slepnev
  • Beginning resources for CEV research by Luke Muehlhauser
  • Cognitive Neuroscience, Arrow's Impossibility Theorem, and Coherent Extrapolated Volition by Luke Muehlhauser
  • Objections to Coherent Extrapolated Volition by Alexander Kruel

See also

  • Friendly AI
  • Metaethics sequence
  • Complexity of value
  • Coherent Aggregated Volition
  • Roko's basilisk
Subscribe
Discussion
2
Subscribe
Discussion
2
Posts tagged Coherent Extrapolated Volition
29Mirrors and Paintings
Eliezer Yudkowsky
17y
42
155The self-unalignment problem
Ω
Jan_Kulveit, rosehadshar
2y
Ω
24
41Requirements for a Basin of Attraction to Alignment
Ω
RogerDearnaley
2y
Ω
12
16Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis
RogerDearnaley
2y
15
18Is it time to start thinking about what AI Friendliness means?
Victor Novikov
3y
6
5Is there any serious attempt to create a system to figure out the CEV of humanity and if not, why haven't we started yet?
Q
Jonas Hallgren
5y
Q
2
80Hacking the CEV for Fun and Profit
Wei Dai
15y
207
51[NSFW Review] Interspecies Reviewers
lsusr
3y
8
34CEV: a utilitarian critique
Pablo
13y
89
21CEV: coherence versus extrapolation
Stuart_Armstrong
11y
17
19Stanovich on CEV
lukeprog
13y
6
13Concept extrapolation: key posts
Ω
Stuart_Armstrong
3y
Ω
2
12CEV-tropes
snarles
11y
15
10CEV-inspired models
Stuart_Armstrong
14y
43
10A problem with the most recently published version of CEV
ThomasCederborg
2y
8
Load More (15/57)
Add Posts